Hi, Sean,

I presume that you would want to do some operations on the selected
rows, and hopefully, your operations do not involve median or
percentiles.  In this case, you can partition the data into smaller
pieces, say 50M or 100M rows per partition, then the amount of memory
required by FastBit to complete the operations would be relatively modest.

In the particular example, 'ibis -d . -q "select id1, id2"', I suspect
that the majority of the execution time is spent on writing out the
two IDs to /dev/null.  Tye 'ibis -d . -q "select id1, id2, count(*)"'
to see if it makes any difference.

John



On 3/21/14, 7:05 PM, Sean McNamara wrote:
> Hi John-
> 
> I had a question about large result sizes.  For our queries it seems
> that past a point, the size of the result begins to have a large
> impact on completion time.  If our queries do heavy event filtering,
> fastbit is crazy insane fast!  Some of our other queries aren't able
> to filter down as much though, and may return 200M rows from a 250M
> dataset (the dataset has many columns, but we only select out 2 long
> columns, all other columns are for filtering).
> 
> 
> So I suppose my question is: is there a way to iterate faster over
> larger results? (we are currently using the table interface /w a
> cursor).  For example  doing a plain select on a pair of longs /w 250M
> rows in ibis takes almost a minute:
> 
> time ibis -d . -q "select id1, id2" > /dev/null 2>&1
> 
> real    0m48.010s
> user    0m44.265s
> sys     0m3.761s
> 
> 
> Do you have any insight into places we could look to trim down the
> time /w larger results?
> 
> Thanks!
> 
> Sean
> 
> 
> 
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to