Mohan,
You might also consider splitting the data on the type column if that works for 
your use cases.
Andrew

On Oct 24, 2013, at 2:05 AM, "John" <[email protected]<mailto:[email protected]>> wrote:

Hi, Mohan,

Are you using cygwin?  The pthread library seems to have some problems under 
cygwin.  As far as I know the warning messages are harmless in this case.  If 
you are not using cygwin, then please give us a little more details.

If you have 378M rows in one data partition, then it is likely that you are 
spilling virtual memory to disk.  You should consider separate them into 4 - 10 
different partitions.  Currently ardea.cpp is not able to separate a single CSV 
file into multiple data partitions, so you will have to split your CSV file 
somehow before calling ardea.

-- John Wu

On Oct 23, 2013, at 2:42 PM, Mohan Embar 
<[email protected]<mailto:[email protected]>> wrote:

Hi John,

Thanks for your quick reply!

I wasn't clear on how I would go about adding data. Wouldn't that require 
rebuilding the indexes each time, which would be an expensive operation?

I have 378M rows and I just imported them with:

../fastbit-ibis1.3.7/examples/ardea -d tmp -m "cid:int, iid:int, date:int, 
type:short" -t data.csv

Then I tried to do:

../fastbit-ibis1.3.7/examples/ibis -d tmp -v -q "where cid=246973 and iid=7"

...and I get a boatload of messages like this:

Constructed a part named tmp
query[QkrXsK8cBV-----0]::setWhereClause -- where "cid=246973 and iid=7"
Warning -- part[tmp]::gainWriteAccess -- pthread_rwlock_trywrlock(0x859c9d8) 
for freeRIDs returned 16 (Device or resource busy)
Warning -- part[tmp]::gainWriteAccess -- pthread_rwlock_trywrlock(0x859c9d8) 
for freeRIDs returned 16 (Device or resource busy)
(millions of times)
....

...before finally printing:

query[QkrXsK8cBV-----0]::evaluate -- time to compute the 35 hits: 25.2392 
sec(CPU), 25.3412 sec(elapsed).
query[QkrXsK8cBV-----0]::evaluate -- user root FROM tmp WHERE cid=246973 and 
iid=7 ==> 35 hits.
doQuery:: evaluate( FROM tmp WHERE cid=246973 and iid=7) produced 35 hits, took 
25.2392 CPU seconds, 25.3

Wasn't sure how to make it print the actual results rather than the count or 
whether that error message was because I had too many rows.

Thanks in advance for any help with this.


On Wed, Oct 23, 2013 at 2:37 PM, John <[email protected]<mailto:[email protected]>> wrote:
Thanks for your interest in FastBit.  Given the types of data and the type of 
query, FastBit would be the perfect tool.  Do you have a sense of how many rows 
you would have?  If you have more than 100 million, you will likely need to 
break them into multiple partitions.

-- John Wu

> On Oct 23, 2013, at 9:08 AM, Mohan Embar 
> <[email protected]<mailto:[email protected]>> wrote:
>
> Hello,
>
> I'm working on a project where we need to query massive amounts of log data 
> (stored in MySQL) and was wondering if you could help me evaluate the 
> suitability of FastBit for this.
>
> The relevant columns are:
>
> contact id: (unsigned int)
> item id: (unsigned int)
> date: (unsigned int)
> type: (numeric value from 0-30)
>
> I want to be able to answer questions like "give me all contacts who have 
> type X, type Y, but not type Z". etc.
>
> I think FastBit is well-suited for this, but the issue is that new log 
> entries are continuously being added, which would preclude FastBit being able 
> to grow these in realtime. Log entries aren't being removed however.
>
> Would FastBit be appropriate for this approach? If not, how would you suggest 
> that I reason about comparing the following alternatives:
>
> - Use a hybrid FastBit / MySQL approach where I submit a query to the known 
> log entries in FastBit, then the same query against the remainder of the 
> MySQL records which haven't yet been added to FastBit (which would be 
> comparatively small)
>
> - Use another approach (Precog)
>
> Thanks in advance!
> _______________________________________________
> FastBit-users mailing list
> [email protected]<mailto:[email protected]>
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]<mailto:[email protected]>
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

_______________________________________________
FastBit-users mailing list
[email protected]<mailto:[email protected]>
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]<mailto:[email protected]>
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to