Hi John, Thanks for your quick reply!
I wasn't clear on how I would go about adding data. Wouldn't that require rebuilding the indexes each time, which would be an expensive operation? I have 378M rows and I just imported them with: ../fastbit-ibis1.3.7/examples/ardea -d tmp -m "cid:int, iid:int, date:int, type:short" -t data.csv Then I tried to do: ../fastbit-ibis1.3.7/examples/ibis -d tmp -v -q "where cid=246973 and iid=7" ...and I get a boatload of messages like this: Constructed a part named tmp query[QkrXsK8cBV-----0]::setWhereClause -- where "cid=246973 and iid=7" Warning -- part[tmp]::gainWriteAccess -- pthread_rwlock_trywrlock(0x859c9d8) for freeRIDs returned 16 (Device or resource busy) Warning -- part[tmp]::gainWriteAccess -- pthread_rwlock_trywrlock(0x859c9d8) for freeRIDs returned 16 (Device or resource busy) (millions of times) .... ...before finally printing: query[QkrXsK8cBV-----0]::evaluate -- time to compute the 35 hits: 25.2392 sec(CPU), 25.3412 sec(elapsed). query[QkrXsK8cBV-----0]::evaluate -- user root FROM tmp WHERE cid=246973 and iid=7 ==> 35 hits. doQuery:: evaluate( FROM tmp WHERE cid=246973 and iid=7) produced 35 hits, took 25.2392 CPU seconds, 25.3 Wasn't sure how to make it print the actual results rather than the count or whether that error message was because I had too many rows. Thanks in advance for any help with this. On Wed, Oct 23, 2013 at 2:37 PM, John <[email protected]> wrote: > Thanks for your interest in FastBit. Given the types of data and the type > of query, FastBit would be the perfect tool. Do you have a sense of how > many rows you would have? If you have more than 100 million, you will > likely need to break them into multiple partitions. > > -- John Wu > > > On Oct 23, 2013, at 9:08 AM, Mohan Embar <[email protected]> wrote: > > > > Hello, > > > > I'm working on a project where we need to query massive amounts of log > data (stored in MySQL) and was wondering if you could help me evaluate the > suitability of FastBit for this. > > > > The relevant columns are: > > > > contact id: (unsigned int) > > item id: (unsigned int) > > date: (unsigned int) > > type: (numeric value from 0-30) > > > > I want to be able to answer questions like "give me all contacts who > have type X, type Y, but not type Z". etc. > > > > I think FastBit is well-suited for this, but the issue is that new log > entries are continuously being added, which would preclude FastBit being > able to grow these in realtime. Log entries aren't being removed however. > > > > Would FastBit be appropriate for this approach? If not, how would you > suggest that I reason about comparing the following alternatives: > > > > - Use a hybrid FastBit / MySQL approach where I submit a query to the > known log entries in FastBit, then the same query against the remainder of > the MySQL records which haven't yet been added to FastBit (which would be > comparatively small) > > > > - Use another approach (Precog) > > > > Thanks in advance! > > _______________________________________________ > > FastBit-users mailing list > > [email protected] > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >
_______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
