Thanks John, I agree partitioning will help us and be much faster but give our total foot print we want to see what is the max data we can have one one node. We will try the maxBytes approach and let you know how it goes.
Just want to clarify, when you say partition do you mean keep the partitions on different boxes or its ok if the partitions are on the same box? thanks, Kishore G On Wed, Mar 27, 2013 at 11:27 PM, K. John Wu <[email protected]> wrote: > Hi, Kishore, > > The easiest way to evaluate sum(v1) with a subset of v1 values is to > read the values into memory and then perform the summation. One could > attempt to use the bitmap index for v1, however, unless the index size > is really small, simply reading the values of v1 is faster. > > Given that FastBit most likely has to read the values of 200 columns, > the maximum memory required is to read every value into memory. > > It is definitely possible to use memory map and let the OS handle the > paging, no matter how it is done, it is still paging. You can tell > FastBit to pretend there is more than 16GB of memory by setting > parameter maxBytes to larger value. The bottom line is that you will > be paging, paging is never fast. > > My believe is that you can do a lot better by partitioning the data > and avoiding the paging. It should be worth your time to give it a try. > > John > > > On 3/27/13 9:49 PM, kishore g wrote: > > Hi John > > > > Thanks for the quick response. Why should they be in memory. I thought > > the bit map indexes would be used first to do the intersection. Once > > it applies the predicates it would then scan the metric data and > > compute the aggregation. All data would be mmapped and the kernel > > would deal with swapping pages in/out > > > > I expected that perf would suffer if memory is less did not anticipate > > that functionality would be impacted > > > > Thanks > > Kishore G > > > > On Mar 27, 2013 8:14 PM, "K. John Wu" <[email protected] > > <mailto:[email protected]>> wrote: > > > > Hi, Aditya, > > > > Looks like you have run out of memory. I would suggest that you > break > > up the data set into multiple partitions, where each partition has a > > subset of the rows. For a conservative estimate, you use the > > following, > > > > 8 bytes (per value, or whatever size is you actual data) * 200 > columns > > * N rows > > > > If you have 16GB of memory, then you can have at most 10 million rows > > in memory at any given time. In this case, you might want to limit > > your data partition size to be 5 million rows. > > > > Good luck. > > > > John > > > > > > On 3/27/13 5:05 PM, Aditya Ramesh wrote: > > > Hi, > > > > > > When I try running a query on a large dataset using the ibis > command > > > line tool, the query executes successfully. However, when I tried > > > using the JNI implementation from > > > https://bitbucket.org/olafW/fastbit4java/src, the query does not > > > succeed and it outputs a bunch of messages of the form: > > > Error -- fileManager::storage failed to malloc 1,423,308 bytes of > > > storage on retry > > > Error -- fileManager::storage failed to malloc 2,846,616 bytes of > > > storage on retry > > > Error -- bundles::ctor received an exception, start cleaning up > > > > > > Eventually, after some time, the program crashes completely with > > the message: > > > # > > > # There is insufficient memory for the Java Runtime Environment > > to continue. > > > # Native memory allocation (malloc) failed to allocate 32744 > > bytes for > > > ChunkPool::allocate > > > # An error report file with more information is saved as: > > > # /export/home/eng/aramesh/LixCluster/hs_err_pid23388.log > > > > > > MyJVM args are "-server -Xms16384m -Xmx16384m -XX:PermSize=128M > > > -XX:MaxPermSize=128M -XX:NewSize=768m -XX:MaxNewSize=768m > > > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=85 > > > -XX:+AlwaysPreTouch -XX:+PrintGCApplicationStoppedTime > > > -XX:+PrintGCTimeStamps -XX:+UseCompressedOops > > > -XX:+ParallelRefProcEnabled -XX:+PrintGCDetails > > -XX:+PrintGCDateStamps > > > -XX:+PrintTenuringDistribution -Xloggc:LixLogs/gc.log > > > -XX:ErrorFile=logs/hs_err.log -Djava.awt.headless=true > > > -Dcom.sun.management.jmxremote -XX:+HeapDumpOnOutOfMemoryError" > > > > > > The query is of the following form: SUM(m1), SUM(m2), .... > SUM(m100) > > > where a = 'a1' and b = 'b1' (i.e, only one row should be returned > > > although there can be a large number of columns (from 60-200) to > > > retrieve). The query also only contains of conjunctions. > > > > > > Is there any way of resolving this problem with JNI? > > > > > > Thanks, > > > Aditya > > > _______________________________________________ > > > FastBit-users mailing list > > > [email protected] <mailto:[email protected]> > > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > > > > _______________________________________________ > > FastBit-users mailing list > > [email protected] <mailto:[email protected]> > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > > > > > > > _______________________________________________ > > FastBit-users mailing list > > [email protected] > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > >
_______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
