Hi, Faisal,

I think it is fine for you to reissue the query with progressively
longer list of conditions.  If you plan to write your own query
processing program on top of FastBit, you can modify the query object
with new set of conditions as you get more conditions.  If you plan to
stay with ibis.cpp, then you can simple issue a new query with longer
and longer where clauses.

For example, while the user is exploring, we will skip the select
clause in the queries and do the following sequence of ibis calls

ibis -d data-dir -q "where a > 1"
ibis -d data-dir -q "where a > 1 and b < 8"
ibis -d data-dir -q "where a > 1 and b < 8 and c > 4"

when the user wants some actual values, you can then take the select
clause and add it to the query string.

Presumably, your question is not about the above procedure but about
the efficiency of the above procedure.  The short answer is that it
would not be too bad.  The reason is that the OS / file system will
cache the content you've just accessed and therefore the second
command will not need to reread the bitmaps involving a from disk
again.  Similarly, when you issue the third query, it will not need to
read the indexes for a or b.  In most case, the I/O cost dominates the
total query answering cost, therefore, not needing to repeat the I/O
operations should reduce the total query answering time.

Of course, coding up something more customized could reduce the query
processing time further, however, the above solution might be
sufficient for relatively modest data sets.

John



On 8/8/12 9:34 AM, S M Faisal wrote:
> Hi John,
> I'm not looking for how to write query results. :) Here's what I'm
> looking for:
> 
> I want to be able to apply a bunch of successive "where" clauses
> before writing out the resulting table. That is, the user may want to
> test different conditions one after another and finally want to see
> the results. All I want to do is, rather than writing out the whole
> table after each query, write some index of the selected rows and
> apply successive query(ies) on these qualified rows and get the output
> at the end.
> 
> Say I have:
> a   b   c
> 1   5   9
> 2   6   3
> 3   9   5
> 4   2   8
> 2   4   4
> 
> Now we apply a series of filtering like 
> a > 1,
> then b < 8
> and then c > 4
> and then print the results which will be
> 
> a   b   c
> 4   2   8
> 
> 
> I want to be able to execute a query corresponding to each condition
> one by one. Now *my question is can we avoid writing the result of
> (a>1) and just somehow store a bit index of the rows that qualify and
> then apply next one (b < 8) on the result index of (a>1) and get a new
> index and so on? I just want to avoid writing the whole table
> (considering its big) after (a > 1) and then again writing the whole
> table after (b < 8) and so on.*
> *
> *
> I just want to keep an index of the qualifying rows and at some point
> when want to look at the complete result, be able to write the output.
> *In the intermediate filtering steps I just want to keep indices and
> be able to know how many rows are selected currently.*
> 
> Please let me know if I need to clarify more.
> 
> Thanks for your help, I really appreciate it :)
> 
> Faisal
> 
> 
> 
> 
> On Wed, Aug 8, 2012 at 7:08 AM, K. John Wu <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Hi, Faisal,
> 
>     If you plan to write the query results yourself, you should be able to
>     use ibis::table::backup to write the content of the
>     ibis::table::select to any designated directory.
> 
>     Alternatively, we could add an option -x to ibis.cpp to write the
>     query results to a directory.  This might take a few days for us to
>     implement.
> 
>     If I am off the mark here, please elaborate.
> 
>     Thanks.
> 
>     John
> 
> 
>     On 8/7/12 4:42 PM, S M Faisal wrote:
>     > Hi John!
>     > The changes you made for me worked perfectly! I now have fast bit
>     > working as I need.
>     >
>     > I have  a questions if you'd kindly answer:
>     >
>     > Is there any way to execute a series of interactive queries without
>     > dumping the whole result after each query? Let me explain: Say I
>     have
>     > three columns a, b and c. I want to select rows where a>10. Then
>     where
>     > 20 < b < 80. Then c < 40. Now I want resulting rows. I know if I can
>     > combine all these where clauses into a single one, it will be simple
>     > to get the answer. But my setup is more interactive where the
>     user can
>     > try simple conditions and build gradually. Is there any way where I
>     > can execute a query and rather than dumping the while table
>     > (satisfying the criteria of course) I can store some index or
>     > something on which I can execute the subsequent queries?
>     >
>     > Thanks for your sincere help and I'd appreciate very much if you can
>     > suggest something along this direction. Please let me know if I
>     > haven't been clear enough in explaining the use case.
>     >
>     > Thanks!
>     > Faisal
>     >
>     > On Mon, Jul 30, 2012 at 10:27 PM, S M Faisal
>     <[email protected] <mailto:[email protected]>
>     > <mailto:[email protected] <mailto:[email protected]>>>
>     wrote:
>     >
>     >     Thank you so much John. I appreciate your sincere help :)
>     >
>     >     I'll definitely try and see how it goes.
>     >
>     >     Thanks,
>     >     Faisal
>     >
>     >
>     >     On Mon, Jul 30, 2012 at 10:04 PM, K. John Wu <[email protected]
>     <mailto:[email protected]>
>     >     <mailto:[email protected] <mailto:[email protected]>>> wrote:
>     >
>     >         Hi, Faisal,
>     >
>     >         Just modified examples/ardea.cpp to write out -part.txt
>     when the
>     >         metadata is present (even if there is no data provided
>     on command
>     >         line).  This should allow you an option to generate the
>     >         -part.txt file
>     >         for the data directories with binary data already.
>     >
>     >         The code is checked in as SVN revision 532.  Please give
>     it a
>     >         try and
>     >         see if you have any questions.
>     >
>     >         John
>     >
>     >
>     >         On 7/30/12 10:30 AM, S M Faisal wrote:
>     >         > Thanks so much! Really appreciate it.
>     >         >
>     >         > Thanks,
>     >         > Faisal
>     >         >
>     >         > On Mon, Jul 30, 2012 at 10:27 AM, K. John Wu
>     <[email protected] <mailto:[email protected]>
>     >         <mailto:[email protected] <mailto:[email protected]>>
>     >         > <mailto:[email protected] <mailto:[email protected]>
>     <mailto:[email protected] <mailto:[email protected]>>>> wrote:
>     >         >
>     >         >     OK.  I will start on modifying ardea.cpp today,
>     will try
>     >         to get it
>     >         >     done today.  I am scheduled to go a family trip
>     starting
>     >         tomorrow.  If
>     >         >     I am not able to get it done today, it will be
>     sometime
>     >         next week
>     >         >     before I can get back to it. Just thought you
>     should know...
>     >         >
>     >         >     John
>     >         >
>     >         >
>     >         >     On 7/30/12 9:52 AM, S M Faisal wrote:
>     >         >     > Hi John,
>     >         >     > Thats absolutely right. I want ardea to just
>     create the
>     >         >     -part.txt file
>     >         >     > from the existing column files for me.
>     >         >     >
>     >         >     > Thanks,
>     >         >     > Faisal
>     >         >     >
>     >         >     > On Fri, Jul 27, 2012 at 5:01 PM, K. John Wu
>     >         <[email protected] <mailto:[email protected]> <mailto:[email protected]
>     <mailto:[email protected]>>
>     >         >     <mailto:[email protected] <mailto:[email protected]>
>     <mailto:[email protected] <mailto:[email protected]>>>
>     >         >     > <mailto:[email protected] <mailto:[email protected]>
>     <mailto:[email protected] <mailto:[email protected]>>
>     >         <mailto:[email protected] <mailto:[email protected]>
>     <mailto:[email protected] <mailto:[email protected]>>>>> wrote:
>     >         >     >
>     >         >     >     Guess you are looking for a command line
>     tool that
>     >         let you
>     >         >     specify the
>     >         >     >     names and types of the data files, and create a
>     >         file named
>     >         >     >     '-part.txt'.  Am I right?
>     >         >     >
>     >         >     >     Please take a look at the command line arguments
>     >         used for
>     >         >     ardea below,
>     >         >     >     We could make it write the file -part.txt if
>     you don't
>     >         >     specify any
>     >         >     >     data.  Would that suite your needs?
>     >         >     >
>     >         >     >     John
>     >         >     >
>     >         >     >
>     >         >     >     PS: The help message from ardea
>     >         >     >     ardea --help
>     >         >     >     usage:
>     >         >     >     /Users/john/src/ibis/examples/.libs/ardea [-c
>     >         conf-file] [-d
>     >         >     >     directory-to-write-data] [-n
>     name-of-dataset] [-r
>     >         >     a-row-in-ASCII] [-t
>     >         >     >     text-file-to-read] [-sqldump file-to-read] [-b
>     >         >     >     break/delimiters-in-text-file][-M
>     metadata-file] [-m
>     >         >     >     name:type[,name:type,...]] [-m
>     max-rows-per-file]
>     >         [-tag
>     >         >     >     name-value-pair] [-select clause] [-where
>     clause]
>     >         [-v[=|
>     >         >     >     ]verbose_level]
>     >         >     >
>     >         >     >     Note:
>     >         >     >             Column name must start with an alphabet
>     >         and can only
>     >         >     contain
>     >         >     >     alphanumeric values, and max-rows-per-file must
>     >         start with a
>     >         >     >     decimal digit
>     >         >     >             This program only recognize the
>     following
>     >         column types:
>     >         >     >             byte, short, int, long, float, double,
>     >         key, and text
>     >         >     >             It only checks the first character
>     of the
>     >         types.
>     >         >     >             For example, one can load the data in
>     >         tests/test0.csv
>     >         >     >     either one of
>     >         >     >     the following command lines:
>     >         >     >             ardea -d somwhere1 -m a:i,b:i,c:i -t
>     >         tests/test0.csv
>     >         >     >             ardea -d somwhere2 -m a:i -m b:f -m
>     c:d -t
>     >         >     tests/test0.csv
>     >         >     >
>     >         >     >
>     >         >     >
>     >         >     >     On 7/27/12 4:42 PM, S M Faisal wrote:
>     >         >     >     > Hi John,
>     >         >     >     > Thanks so much for your quick answer. Really
>     >         appreciate it.
>     >         >     >     >
>     >         >     >     > To make sure I understand, does it mean I have
>     >         to create the
>     >         >     >     -part.txt
>     >         >     >     > file myself (manually)? Do you happen to
>     >         >     >     > have a script that somehow generates the
>     >         -part.txt file
>     >         >     when the
>     >         >     >     user
>     >         >     >     > already has the files in binary format.
>     Because
>     >         >     >     > I'm talking about cases where I have few
>     hundred
>     >         columns
>     >         >     and its
>     >         >     >     > really nearly impossible to generate a file
>     >         manually.
>     >         >     >     >
>     >         >     >     > Any help in this regard?
>     >         >     >     >
>     >         >     >     > Thanks,
>     >         >     >     > faisal
>     >         >     >     >
>     >         >     >     > On Fri, Jul 27, 2012 at 4:35 PM, K. John Wu
>     >         <[email protected] <mailto:[email protected]> <mailto:[email protected]
>     <mailto:[email protected]>>
>     >         >     <mailto:[email protected] <mailto:[email protected]>
>     <mailto:[email protected] <mailto:[email protected]>>>
>     >         >     >     <mailto:[email protected] <mailto:[email protected]>
>     <mailto:[email protected] <mailto:[email protected]>>
>     >         <mailto:[email protected] <mailto:[email protected]>
>     <mailto:[email protected] <mailto:[email protected]>>>>
>     >         >     >     > <mailto:[email protected] <mailto:[email protected]>
>     <mailto:[email protected] <mailto:[email protected]>>
>     >         <mailto:[email protected] <mailto:[email protected]>
>     <mailto:[email protected] <mailto:[email protected]>>>
>     >         >     <mailto:[email protected] <mailto:[email protected]>
>     <mailto:[email protected] <mailto:[email protected]>>
>     >         <mailto:[email protected] <mailto:[email protected]>
>     <mailto:[email protected] <mailto:[email protected]>>>>>> wrote:
>     >         >     >     >
>     >         >     >     >     Hi, Faisal,
>     >         >     >     >
>     >         >     >     >     Thanks for your interest in FastBit.
>      If you
>     >         have the data
>     >         >     >     already in
>     >         >     >     >     binary format (for the machine you are
>     >         running on),
>     >         >     then you
>     >         >     >     need to
>     >         >     >     >     put the data files into a directory and
>     >         place a file named
>     >         >     >     '-part.txt'
>     >         >     >     >     to tell FastBit the data types of the
>     files.
>     >          Another
>     >         >     thing
>     >         >     >     to note is
>     >         >     >     >     that the file names are taken to be the
>     >         column names (and
>     >         >     >     the case of
>     >         >     >     >     the file name must match the case used in
>     >         '-part.txt').  A
>     >         >     >     directory
>     >         >     >     >     is considered a partition of a data table,
>     >         and a data
>     >         >     table
>     >         >     >     could have
>     >         >     >     >     any number of data partitions.
>     >         >     >     >
>     >         >     >     >     The file
>     >         >     <http://lbl.gov/~kwu/fastbit/doc/dataLoading.html
>     <http://lbl.gov/%7Ekwu/fastbit/doc/dataLoading.html>
>     >         <http://lbl.gov/%7Ekwu/fastbit/doc/dataLoading.html>
>     >         >     <http://lbl.gov/%7Ekwu/fastbit/doc/dataLoading.html>
>     >         >     >    
>     <http://lbl.gov/%7Ekwu/fastbit/doc/dataLoading.html>
>     >         >     >     >
>     >         <http://lbl.gov/%7Ekwu/fastbit/doc/dataLoading.html>>
>     >         >     has a
>     >         >     >     >     little bit more details.
>     >         >     >     >
>     >         >     >     >     Hope this helps.
>     >         >     >     >
>     >         >     >     >     John
>     >         >     >     >
>     >         >     >     >
>     >         >     >     >     On 7/27/12 4:10 PM, S M Faisal wrote:
>     >         >     >     >     > Hi,
>     >         >     >     >     > I'm new to FastBit. I see that there are
>     >         programs for
>     >         >     >     preprocessing
>     >         >     >     >     > and formatting data so that FastBit
>     can be
>     >         used. But
>     >         >     what
>     >         >     >     if my data
>     >         >     >     >     > is already in column files in binary
>     >         format? That is, my
>     >         >     >     data is
>     >         >     >     >     > already stored as one file per
>     column and
>     >         in binary
>     >         >     >     format. All that
>     >         >     >     >     > is missing is the -part.txt file.
>     >         >     >     >     >
>     >         >     >     >     > How should I proceed?
>     >         >     >     >     >
>     >         >     >     >     > Thanks in advance!
>     >         >     >     >     >
>     >         >     >     >     > --
>     >         >     >     >     > -----------------------------------
>     >         >     >     >     > faisal
>     >         >     >     >     >
>     >         >     >     >     >
>     >         >     >     >     >
>     >         _______________________________________________
>     >         >     >     >     > FastBit-users mailing list
>     >         >     >     >     > [email protected]
>     <mailto:[email protected]>
>     >         <mailto:[email protected]
>     <mailto:[email protected]>>
>     >         >     <mailto:[email protected]
>     <mailto:[email protected]>
>     >         <mailto:[email protected]
>     <mailto:[email protected]>>>
>     >         >     >     <mailto:[email protected]
>     <mailto:[email protected]>
>     >         <mailto:[email protected]
>     <mailto:[email protected]>>
>     >         >     <mailto:[email protected]
>     <mailto:[email protected]>
>     >         <mailto:[email protected]
>     <mailto:[email protected]>>>>
>     >         >     >     <mailto:[email protected]
>     <mailto:[email protected]>
>     >         <mailto:[email protected]
>     <mailto:[email protected]>>
>     >         >     <mailto:[email protected]
>     <mailto:[email protected]>
>     >         <mailto:[email protected]
>     <mailto:[email protected]>>>
>     >         >     >     <mailto:[email protected]
>     <mailto:[email protected]>
>     >         <mailto:[email protected]
>     <mailto:[email protected]>>
>     >         >     <mailto:[email protected]
>     <mailto:[email protected]>
>     >         <mailto:[email protected]
>     <mailto:[email protected]>>>>>
>     >         >     >     >     >
>     >         >
>     >        
>     https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>     >         >     >     >     >
>     >         >     >     >
>     >         >     >     >
>     >         >     >     >
>     >         >     >     >
>     >         >     >     > --
>     >         >     >     > -----------------------------------
>     >         >     >     > faisal
>     >         >     >     >
>     >         >     >
>     >         >     >
>     >         >     >
>     >         >
>     >         >
>     >         >
>     >
>     >
> 
> 
> 
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to