Hi John,

and thanks for your help, although this new ibis::filter interface does not
directly meet my needs, because it is always returning a table-object. I
just wanted to let you know that I'm pretty happy with my current solution.

So, I'm doing kind of iterative data reduction, where I am trying to avoid
bringing any real data into memory until the very last moment. For this, I
am using the query interface to filter the data using the indexes and
saving the resulting bitvectors in memory for later use. If I need to
combine some of the results, I use the bit operations provided by the
bitvector class. When I finally need the data itsef, I am using the table
interface to select the needed columns with a previously calculated
bitvector, which is potentially paged to limit the results. With the
resulting table object I can then use aggregation if needed.

I hope this clarifies my implementation a bit.

Best regards

Patrik Nisen


On Wed, Dec 19, 2012 at 6:32 PM, K. John Wu <[email protected]> wrote:

> Hi, Patrik,
>
> You mentioned that ibis::query class does not have what you need.
> Would ibis::quaere class, more specifically, ibis::filter class
> <http://lbl.gov/~kwu/fastbit/doc/html/classibis_1_1filter.html>, be
> better suited for your work?  Because the return from a select
> function call on this object is an ibis::table, you are free to
> perform group by operations on it.  In additional, you can put the
> group by operations as the argument of the function
> ibis::filter::select, which would save you the trouble of calling
> ibis::table::groupby.
>
> Looks to me what might be needed is an addition of a constructor that
> takes a bitvector object in place of a where clause and a single data
> partition.  This constructor has been added to ibis::filter class.
> Please take a look at src/filter.h and src/filter.cpp.
>
> By the way, the class ibis::bitvector has read and write functions
> that allow you to read and write the bitvector returned by
> ibis::query::getHitVector.
>
> Let us know if you have a chance to try it.
>
> John
>
>
> On 12/19/12 7:17 AM, Patrik Nisen wrote:
> > Hi,
> >
> > I'm sorry I was not able to explain my problem clearly. However, I
> > found a close enough solution, so let me explain that and perhaps
> > clarify it a bit.
> >
> > So the idea was to save an index to the data based on a query, and
> > reuse that index later. I want to save the result, because the
> > filtering is done with pretty expensive "LIKE" queryies on text
> > columns (and previously outside Fastbit). So I'm saving the bitvectors
> > retrieved from an evaluated query-object. Then, I want to run
> > aggregate queries over these results, which can be done mainly using
> > the table interface. However, the normal interface does not allow use
> > of bitvectors to limit the query to only those rows defined in the
> > bitvector, because the "generic" table could be, for instance, using
> > many partitions (I suppose). I found out that I can populate a table
> > myself using directly the bord-class and its append-method, and bring
> > in just the records needed. Then I can call groupby for those results.
> >
> > I did not find a straightforward way to reuse these bitvectors with
> > the query interface, so at the moment I'm simply doing AND between the
> > two bitvector results, which is fine when the following queries are
> > not using text columns. One option could be to use the
> > getRIDs(bitvector) method to convert the bitvector to RIDSet and then
> > use that with setRIDs() method, byt I have not looked into that yet.
> >
> > Thanks!
> >
> > Patrik
> >
> > On Mon, Dec 10, 2012 at 5:26 PM, K. John Wu <[email protected]> wrote:
> >> Hi, Patrik,
> >>
> >> If you would like to write a data table to screen or a file in CSV
> >> format, use the function ibis::table::dump.  If you plan to write a
> >> data table out in binary format that can be used for further queries,
> >> then use the function ibis::table::backup.
> >>
> >> John
> >>
> >>
> >> On 12/10/12 4:29 AM, Patrik Nisen wrote:
> >>> Hi,
> >>>
> >>> yes, it would. Would it then be possible to do that and save the
> >>> intermediary results into a file?
> >>>
> >>> Thank you.
> >>>
> >>> Patrik
> >>>
> >>>
> >>> On 12/08/12 at 12:56pm, K. John Wu wrote:
> >>>> Hi, Patrik,
> >>>>
> >>>> Would it be possible for to issue two queries with the same where
> >>>> clause, but different select clauses?
> >>>>
> >>>> John
> >>>>
> >>>>
> >>>> On 12/7/12 4:13 AM, Patrik Nisen wrote:
> >>>>> Hi,
> >>>>>
> >>>>> and thank you for your great work!
> >>>>>
> >>>>> I am currently looking into performing operations for pre-filtered
> >>>>> sets of rows, and I would need some help to understand if this is
> >>>>> possible at the moment to do with fastbit, or advice to implement it.
> >>>>>
> >>>>> I have a dataset saved into a single data partition and my goal is to
> >>>>> perform filtering with varying conditions, save these results and use
> >>>>> them as starting sets for later queries. If I have understood
> >>>>> correctly, this is at the moment possible by retrieving the RIDSet
> from
> >>>>> an evaluated query, saving that, and setting it (query::setRIDs) to
> the
> >>>>> next query.  However, I would like to use aggregate functions with
> the
> >>>>> following queries, but I did not find a way to do similar things with
> >>>>> the table interface.
> >>>>>
> >>>>> So my question is: how could I perform the described aggregation for
> a
> >>>>> pre-filtered set of rows? In addition, as I'm only having one
> partition,
> >>>>> I would prefer to save the filtering results as bitvectors
> >>>>> (query::getHitVector) and reuse them later as masks due to their
> smaller
> >>>>> size. There's a protected function query::doEvaluate having this
> >>>>> functionality, and perhaps that could be opened. Would this make any
> >>>>> sense?
> >>>>>
> >>>>> Thanks for your help!
> >>>>>
> >>>>>
> >>>>> Patrik Nisen
> >>>>> _______________________________________________
> >>>>> FastBit-users mailing list
> >>>>> [email protected]
> >>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> >>>>>
> > _______________________________________________
> > FastBit-users mailing list
> > [email protected]
> > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> >
>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to