Hi John,

Thanks for your quick reply!

I wasn't clear on how I would go about adding data. Wouldn't that require
rebuilding the indexes each time, which would be an expensive operation?

I have 378M rows and I just imported them with:

../fastbit-ibis1.3.7/examples/ardea -d tmp -m "cid:int, iid:int, date:int,
type:short" -t data.csv

Then I tried to do:

../fastbit-ibis1.3.7/examples/ibis -d tmp -v -q "where cid=246973 and iid=7"

...and I get a boatload of messages like this:

Constructed a part named tmp
query[QkrXsK8cBV-----0]::setWhereClause -- where "cid=246973 and iid=7"
Warning -- part[tmp]::gainWriteAccess --
pthread_rwlock_trywrlock(0x859c9d8) for freeRIDs returned 16 (Device or
resource busy)
Warning -- part[tmp]::gainWriteAccess --
pthread_rwlock_trywrlock(0x859c9d8) for freeRIDs returned 16 (Device or
resource busy)
(millions of times)
....

...before finally printing:

query[QkrXsK8cBV-----0]::evaluate -- time to compute the 35 hits: 25.2392
sec(CPU), 25.3412 sec(elapsed).
query[QkrXsK8cBV-----0]::evaluate -- user root FROM tmp WHERE cid=246973
and iid=7 ==> 35 hits.
doQuery:: evaluate( FROM tmp WHERE cid=246973 and iid=7) produced 35 hits,
took 25.2392 CPU seconds, 25.3

Wasn't sure how to make it print the actual results rather than the count
or whether that error message was because I had too many rows.

Thanks in advance for any help with this.


On Wed, Oct 23, 2013 at 2:37 PM, John <[email protected]> wrote:

> Thanks for your interest in FastBit.  Given the types of data and the type
> of query, FastBit would be the perfect tool.  Do you have a sense of how
> many rows you would have?  If you have more than 100 million, you will
> likely need to break them into multiple partitions.
>
> -- John Wu
>
> > On Oct 23, 2013, at 9:08 AM, Mohan Embar <[email protected]> wrote:
> >
> > Hello,
> >
> > I'm working on a project where we need to query massive amounts of log
> data (stored in MySQL) and was wondering if you could help me evaluate the
> suitability of FastBit for this.
> >
> > The relevant columns are:
> >
> > contact id: (unsigned int)
> > item id: (unsigned int)
> > date: (unsigned int)
> > type: (numeric value from 0-30)
> >
> > I want to be able to answer questions like "give me all contacts who
> have type X, type Y, but not type Z". etc.
> >
> > I think FastBit is well-suited for this, but the issue is that new log
> entries are continuously being added, which would preclude FastBit being
> able to grow these in realtime. Log entries aren't being removed however.
> >
> > Would FastBit be appropriate for this approach? If not, how would you
> suggest that I reason about comparing the following alternatives:
> >
> > - Use a hybrid FastBit / MySQL approach where I submit a query to the
> known log entries in FastBit, then the same query against the remainder of
> the MySQL records which haven't yet been added to FastBit (which would be
> comparatively small)
> >
> > - Use another approach (Precog)
> >
> > Thanks in advance!
> > _______________________________________________
> > FastBit-users mailing list
> > [email protected]
> > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to