Re: [FastBit-users] Using FastBit to Query Massive Logs

Mohan Embar Thu, 24 Oct 2013 06:27:42 -0700

Hi John & Andrew,

Thanks for your replies. I'm using CentOS 5.9 (as a VM), not Cygwin.


For the query I mentioned:

../fastbit-ibis1.3.7/examples/
ibis -d tmp -v -q "where cid=246973 and iid=7"

...I thought it would print a result set. Is this not the case for the
above query? Or is it the memory issues that would prevent this from
happening?



On Wed, Oct 23, 2013 at 11:04 PM, John <[email protected]> wrote:

> Hi, Mohan,
>
> Are you using cygwin?  The pthread library seems to have some problems
> under cygwin.  As far as I know the warning messages are harmless in this
> case.  If you are not using cygwin, then please give us a little more
> details.
>
> If you have 378M rows in one data partition, then it is likely that you
> are spilling virtual memory to disk.  You should consider separate them
> into 4 - 10 different partitions.  Currently ardea.cpp is not able to
> separate a single CSV file into multiple data partitions, so you will have
> to split your CSV file somehow before calling ardea.
>
> -- John Wu
>
> On Oct 23, 2013, at 2:42 PM, Mohan Embar <[email protected]> wrote:
>
> Hi John,
>
> Thanks for your quick reply!
>
> I wasn't clear on how I would go about adding data. Wouldn't that require
> rebuilding the indexes each time, which would be an expensive operation?
>
> I have 378M rows and I just imported them with:
>
> ../fastbit-ibis1.3.7/examples/ardea -d tmp -m "cid:int, iid:int, date:int,
> type:short" -t data.csv
>
> Then I tried to do:
>
> ../fastbit-ibis1.3.7/examples/ibis -d tmp -v -q "where cid=246973 and
> iid=7"
>
> ...and I get a boatload of messages like this:
>
> Constructed a part named tmp
> query[QkrXsK8cBV-----0]::setWhereClause -- where "cid=246973 and iid=7"
> Warning -- part[tmp]::gainWriteAccess --
> pthread_rwlock_trywrlock(0x859c9d8) for freeRIDs returned 16 (Device or
> resource busy)
> Warning -- part[tmp]::gainWriteAccess --
> pthread_rwlock_trywrlock(0x859c9d8) for freeRIDs returned 16 (Device or
> resource busy)
> (millions of times)
> ....
>
> ...before finally printing:
>
> query[QkrXsK8cBV-----0]::evaluate -- time to compute the 35 hits: 25.2392
> sec(CPU), 25.3412 sec(elapsed).
> query[QkrXsK8cBV-----0]::evaluate -- user root FROM tmp WHERE cid=246973
> and iid=7 ==> 35 hits.
> doQuery:: evaluate( FROM tmp WHERE cid=246973 and iid=7) produced 35 hits,
> took 25.2392 CPU seconds, 25.3
>
> Wasn't sure how to make it print the actual results rather than the count
> or whether that error message was because I had too many rows.
>
> Thanks in advance for any help with this.
>
>
> On Wed, Oct 23, 2013 at 2:37 PM, John <[email protected]> wrote:
>
>> Thanks for your interest in FastBit.  Given the types of data and the
>> type of query, FastBit would be the perfect tool.  Do you have a sense of
>> how many rows you would have?  If you have more than 100 million, you will
>> likely need to break them into multiple partitions.
>>
>> -- John Wu
>>
>> > On Oct 23, 2013, at 9:08 AM, Mohan Embar <[email protected]> wrote:
>> >
>> > Hello,
>> >
>> > I'm working on a project where we need to query massive amounts of log
>> data (stored in MySQL) and was wondering if you could help me evaluate the
>> suitability of FastBit for this.
>> >
>> > The relevant columns are:
>> >
>> > contact id: (unsigned int)
>> > item id: (unsigned int)
>> > date: (unsigned int)
>> > type: (numeric value from 0-30)
>> >
>> > I want to be able to answer questions like "give me all contacts who
>> have type X, type Y, but not type Z". etc.
>> >
>> > I think FastBit is well-suited for this, but the issue is that new log
>> entries are continuously being added, which would preclude FastBit being
>> able to grow these in realtime. Log entries aren't being removed however.
>> >
>> > Would FastBit be appropriate for this approach? If not, how would you
>> suggest that I reason about comparing the following alternatives:
>> >
>> > - Use a hybrid FastBit / MySQL approach where I submit a query to the
>> known log entries in FastBit, then the same query against the remainder of
>> the MySQL records which haven't yet been added to FastBit (which would be
>> comparatively small)
>> >
>> > - Use another approach (Precog)
>> >
>> > Thanks in advance!
>> > _______________________________________________
>> > FastBit-users mailing list
>> > [email protected]
>> > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>> _______________________________________________
>> FastBit-users mailing list
>> [email protected]
>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>
>
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>
>
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>
>

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Re: [FastBit-users] Using FastBit to Query Massive Logs

Reply via email to