Hello,

I'm exploring the use of fastbit using the mensa interface to generate queries 
on a table that is constructed from many partitions.
The queries will use an aggregation function (either distinct or sum) on one or 
two columns (e.g. sum(col_1), distinct(col_2) ), so the final result set will 
contain 1 row with 1 or 2 columns.

I was working on a large data set of about 160 million records using 90 
partitions (each partition has ~1.8 million records)

I've noticed that for the sum column, fastbit allocate memory for the number of 
hits the sum is working on (based on the where criteria) x double. and after 
the 
calculation of the sum, this memory is released. 


I was wondering why this is needed, as the sum aggregation function can 
itterate 
on the actual column that is cached based on the hits array (by the way, I used 
to to perfom the sum on a byte/short column) and use an array of results based 
on the groupby size (in my case 1 as no column was retrieved with no 
aggreagation function). 


When the SUM was perfomed on all of the 160 million records, the amount of 
memory that was allocated and released later was ~1.2GB

Am I missing something ? 
Is there a workaround for this use case (avoiding the need to allocate this 
memory) ?

Thanks in advance,
Avi



      
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to