Re: [FastBit-users] Using FastBit to Query Massive Logs

K. John Wu Sat, 26 Oct 2013 14:28:23 -0700

Hi, Mohan,

Just try FastBit on CentOS 6.4 as a VM, 'make more-check' did not
shown the same problem with pthread_rwlock_trywrlock.


What appears to be going on is that your data partition is too large,
the data or index can not be loaded into memory.  This causes the
query evaluation procedure to take an option that is likely extremely
slow.

You will need to break up your data into multiple directories/partitions.

John


On 10/24/13, 6:26 AM, Mohan Embar wrote:
> Hi John & Andrew,
> 
> Thanks for your replies. I'm using CentOS 5.9 (as a VM), not Cygwin.
> 
> For the query I mentioned:
> 
> ../fastbit-ibis1.3.7/examples/
> ibis -d tmp -v -q "where cid=246973 and iid=7"
> 
> ...I thought it would print a result set. Is this not the case for the
> above query? Or is it the memory issues that would prevent this from
> happening?
> 
> 
> 
> On Wed, Oct 23, 2013 at 11:04 PM, John <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Hi, Mohan,
> 
>     Are you using cygwin?  The pthread library seems to have some
>     problems under cygwin.  As far as I know the warning messages are
>     harmless in this case.  If you are not using cygwin, then please
>     give us a little more details.
> 
>     If you have 378M rows in one data partition, then it is likely
>     that you are spilling virtual memory to disk.  You should consider
>     separate them into 4 - 10 different partitions.  Currently
>     ardea.cpp is not able to separate a single CSV file into multiple
>     data partitions, so you will have to split your CSV file somehow
>     before calling ardea.
> 
>     -- John Wu
> 
>     On Oct 23, 2013, at 2:42 PM, Mohan Embar <[email protected]
>     <mailto:[email protected]>> wrote:
> 
>>     Hi John,
>>
>>     Thanks for your quick reply!
>>
>>     I wasn't clear on how I would go about adding data. Wouldn't
>>     that require rebuilding the indexes each time, which would be an
>>     expensive operation?
>>
>>     I have 378M rows and I just imported them with:
>>
>>     ../fastbit-ibis1.3.7/examples/ardea -d tmp -m "cid:int, iid:int,
>>     date:int, type:short" -t data.csv
>>
>>     Then I tried to do:
>>
>>     ../fastbit-ibis1.3.7/examples/ibis -d tmp -v -q "where
>>     cid=246973 and iid=7"
>>
>>     ...and I get a boatload of messages like this:
>>
>>     Constructed a part named tmp
>>     query[QkrXsK8cBV-----0]::setWhereClause -- where "cid=246973 and
>>     iid=7"
>>     Warning -- part[tmp]::gainWriteAccess --
>>     pthread_rwlock_trywrlock(0x859c9d8) for freeRIDs returned 16
>>     (Device or resource busy)
>>     Warning -- part[tmp]::gainWriteAccess --
>>     pthread_rwlock_trywrlock(0x859c9d8) for freeRIDs returned 16
>>     (Device or resource busy)
>>     (millions of times)
>>     ....
>>
>>     ...before finally printing:
>>
>>     query[QkrXsK8cBV-----0]::evaluate -- time to compute the 35
>>     hits: 25.2392 sec(CPU), 25.3412 sec(elapsed).
>>     query[QkrXsK8cBV-----0]::evaluate -- user root FROM tmp WHERE
>>     cid=246973 and iid=7 ==> 35 hits.
>>     doQuery:: evaluate( FROM tmp WHERE cid=246973 and iid=7)
>>     produced 35 hits, took 25.2392 CPU seconds, 25.3
>>
>>     Wasn't sure how to make it print the actual results rather than
>>     the count or whether that error message was because I had too
>>     many rows.
>>
>>     Thanks in advance for any help with this.
>>
>>
>>     On Wed, Oct 23, 2013 at 2:37 PM, John <[email protected]
>>     <mailto:[email protected]>> wrote:
>>
>>         Thanks for your interest in FastBit.  Given the types of
>>         data and the type of query, FastBit would be the perfect
>>         tool.  Do you have a sense of how many rows you would have?
>>          If you have more than 100 million, you will likely need to
>>         break them into multiple partitions.
>>
>>         -- John Wu
>>
>>         > On Oct 23, 2013, at 9:08 AM, Mohan Embar <[email protected]
>>         <mailto:[email protected]>> wrote:
>>         >
>>         > Hello,
>>         >
>>         > I'm working on a project where we need to query massive
>>         amounts of log data (stored in MySQL) and was wondering if
>>         you could help me evaluate the suitability of FastBit for this.
>>         >
>>         > The relevant columns are:
>>         >
>>         > contact id: (unsigned int)
>>         > item id: (unsigned int)
>>         > date: (unsigned int)
>>         > type: (numeric value from 0-30)
>>         >
>>         > I want to be able to answer questions like "give me all
>>         contacts who have type X, type Y, but not type Z". etc.
>>         >
>>         > I think FastBit is well-suited for this, but the issue is
>>         that new log entries are continuously being added, which
>>         would preclude FastBit being able to grow these in realtime.
>>         Log entries aren't being removed however.
>>         >
>>         > Would FastBit be appropriate for this approach? If not,
>>         how would you suggest that I reason about comparing the
>>         following alternatives:
>>         >
>>         > - Use a hybrid FastBit / MySQL approach where I submit a
>>         query to the known log entries in FastBit, then the same
>>         query against the remainder of the MySQL records which
>>         haven't yet been added to FastBit (which would be
>>         comparatively small)
>>         >
>>         > - Use another approach (Precog)
>>         >
>>         > Thanks in advance!
>>         > _______________________________________________
>>         > FastBit-users mailing list
>>         > [email protected]
>>         <mailto:[email protected]>
>>         > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>         _______________________________________________
>>         FastBit-users mailing list
>>         [email protected]
>>         <mailto:[email protected]>
>>         https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>
>>
>>     _______________________________________________
>>     FastBit-users mailing list
>>     [email protected] <mailto:[email protected]>
>>     https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 
>     _______________________________________________
>     FastBit-users mailing list
>     [email protected] <mailto:[email protected]>
>     https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 
> 
> 
> 
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Re: [FastBit-users] Using FastBit to Query Massive Logs

Reply via email to