Hi, Avi, You are absolutely right that FastBit could have implemented something smarter for these aggregation functions. We probably will get around to implement something new in a few months. In the mean time, if you have some ideas feel free to give them a try. We would love to have some contributions to speed up the computation of aggregation functions.
Thanks. John On 2/15/2011 6:39 AM, Avi Haleva wrote: > Hello, > I'm exploring the use of fastbit using the mensa interface to generate > queries on a table that is constructed from many partitions. > The queries will use an aggregation function (either distinct or sum) > on one or two columns (e.g. sum(col_1), distinct(col_2) ), so the > final result set will contain 1 row with 1 or 2 columns. > I was working on a large data set of about 160 million records using > 90 partitions (each partition has ~1.8 million records) > I've noticed that for the sum column, fastbit allocate memory for the > number of hits the sum is working on (based on the where criteria) x > double. and after the calculation of the sum, this memory is released. > I was wondering why this is needed, as the sum aggregation function > can itterate on the actual column that is cached based on the hits > array (by the way, I used to to perfom the sum on a byte/short column) > and use an array of results based on the groupby size (in my case 1 as > no column was retrieved with no aggreagation function). > When the SUM was perfomed on all of the 160 million records, the > amount of memory that was allocated and released later was ~1.2GB > Am I missing something ? > Is there a workaround for this use case (avoiding the need to allocate > this memory) ? > Thanks in advance, > Avi > > > > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
