Hello,
I'm exploring the use of fastbit using the mensa interface to generate queries
on a table that is constructed from many partitions.
The queries will use an aggregation function (either distinct or sum) on one or
two columns (e.g. sum(col_1), distinct(col_2) ), so the final result set will
contain 1 row with 1 or 2 columns.
I was working on a large data set of about 160 million records using 90
partitions (each partition has ~1.8 million records)
I've noticed that for the sum column, fastbit allocate memory for the number of
hits the sum is working on (based on the where criteria) x double. and after
the
calculation of the sum, this memory is released.
I was wondering why this is needed, as the sum aggregation function can
itterate
on the actual column that is cached based on the hits array (by the way, I used
to to perfom the sum on a byte/short column) and use an array of results based
on the groupby size (in my case 1 as no column was retrieved with no
aggreagation function).
When the SUM was perfomed on all of the 160 million records, the amount of
memory that was allocated and released later was ~1.2GB
Am I missing something ?
Is there a workaround for this use case (avoiding the need to allocate this
memory) ?
Thanks in advance,
Avi
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users