Yes. John
On 3/28/13 7:42 AM, kishore g wrote: > That means if we create data partitions and keep it on the same node, > we wont hit the memory problem we are seeing now ? > > thanks, > Kishore G > > > On Thu, Mar 28, 2013 at 7:34 AM, K. John Wu <[email protected] > <mailto:[email protected]>> wrote: > > You can have different data partitions on the same computer node. For > simple operators like, SUM, MIN and MAX, FastBit could do the > operations with only on pass through the data. Of course, using more > computer nodes will reduce the overall execution time. > > john > > > On 3/28/13 7:31 AM, kishore g wrote: > > Thanks John, I agree partitioning will help us and be much > faster but > > give our total foot print we want to see what is the max data we can > > have one one node. We will try the maxBytes approach and let you > know > > how it goes. > > > > Just want to clarify, when you say partition do you mean keep the > > partitions on different boxes or its ok if the partitions are on the > > same box? > > > > thanks, > > Kishore G > > > > > > On Wed, Mar 27, 2013 at 11:27 PM, K. John Wu <[email protected] > <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>> wrote: > > > > Hi, Kishore, > > > > The easiest way to evaluate sum(v1) with a subset of v1 > values is to > > read the values into memory and then perform the summation. > One could > > attempt to use the bitmap index for v1, however, unless the > index size > > is really small, simply reading the values of v1 is faster. > > > > Given that FastBit most likely has to read the values of 200 > columns, > > the maximum memory required is to read every value into memory. > > > > It is definitely possible to use memory map and let the OS > handle the > > paging, no matter how it is done, it is still paging. You > can tell > > FastBit to pretend there is more than 16GB of memory by setting > > parameter maxBytes to larger value. The bottom line is that > you will > > be paging, paging is never fast. > > > > My believe is that you can do a lot better by partitioning > the data > > and avoiding the paging. It should be worth your time to > give it > > a try. > > > > John > > > > > > On 3/27/13 9:49 PM, kishore g wrote: > > > Hi John > > > > > > Thanks for the quick response. Why should they be in memory. I > > thought > > > the bit map indexes would be used first to do the > intersection. Once > > > it applies the predicates it would then scan the metric > data and > > > compute the aggregation. All data would be mmapped and the > kernel > > > would deal with swapping pages in/out > > > > > > I expected that perf would suffer if memory is less did not > > anticipate > > > that functionality would be impacted > > > > > > Thanks > > > Kishore G > > > > > > On Mar 27, 2013 8:14 PM, "K. John Wu" <[email protected] > <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>> > > > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>>> wrote: > > > > > > Hi, Aditya, > > > > > > Looks like you have run out of memory. I would > suggest that > > you break > > > up the data set into multiple partitions, where each > > partition has a > > > subset of the rows. For a conservative estimate, you > use the > > > following, > > > > > > 8 bytes (per value, or whatever size is you actual data) * > > 200 columns > > > * N rows > > > > > > If you have 16GB of memory, then you can have at most 10 > > million rows > > > in memory at any given time. In this case, you might want > > to limit > > > your data partition size to be 5 million rows. > > > > > > Good luck. > > > > > > John > > > > > > > > > On 3/27/13 5:05 PM, Aditya Ramesh wrote: > > > > Hi, > > > > > > > > When I try running a query on a large dataset using the > > ibis command > > > > line tool, the query executes successfully. However, > when > > I tried > > > > using the JNI implementation from > > > > https://bitbucket.org/olafW/fastbit4java/src, the query > > does not > > > > succeed and it outputs a bunch of messages of the form: > > > > Error -- fileManager::storage failed to malloc 1,423,308 > > bytes of > > > > storage on retry > > > > Error -- fileManager::storage failed to malloc 2,846,616 > > bytes of > > > > storage on retry > > > > Error -- bundles::ctor received an exception, start > > cleaning up > > > > > > > > Eventually, after some time, the program crashes > > completely with > > > the message: > > > > # > > > > # There is insufficient memory for the Java Runtime > > Environment > > > to continue. > > > > # Native memory allocation (malloc) failed to > allocate 32744 > > > bytes for > > > > ChunkPool::allocate > > > > # An error report file with more information is > saved as: > > > > # > /export/home/eng/aramesh/LixCluster/hs_err_pid23388.log > > > > > > > > MyJVM args are "-server -Xms16384m -Xmx16384m > > -XX:PermSize=128M > > > > -XX:MaxPermSize=128M -XX:NewSize=768m > -XX:MaxNewSize=768m > > > > -XX:+UseConcMarkSweepGC > -XX:CMSInitiatingOccupancyFraction=85 > > > > -XX:+AlwaysPreTouch -XX:+PrintGCApplicationStoppedTime > > > > -XX:+PrintGCTimeStamps -XX:+UseCompressedOops > > > > -XX:+ParallelRefProcEnabled -XX:+PrintGCDetails > > > -XX:+PrintGCDateStamps > > > > -XX:+PrintTenuringDistribution -Xloggc:LixLogs/gc.log > > > > -XX:ErrorFile=logs/hs_err.log -Djava.awt.headless=true > > > > -Dcom.sun.management.jmxremote > > -XX:+HeapDumpOnOutOfMemoryError" > > > > > > > > The query is of the following form: SUM(m1), > SUM(m2), .... > > SUM(m100) > > > > where a = 'a1' and b = 'b1' (i.e, only one row should be > > returned > > > > although there can be a large number of columns (from > > 60-200) to > > > > retrieve). The query also only contains of conjunctions. > > > > > > > > Is there any way of resolving this problem with JNI? > > > > > > > > Thanks, > > > > Aditya > > > > _______________________________________________ > > > > FastBit-users mailing list > > > > [email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>> > > <mailto:[email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>>> > > > > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > > > > > > _______________________________________________ > > > FastBit-users mailing list > > > [email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>> > > <mailto:[email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>>> > > > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > > > > > > > > > > > _______________________________________________ > > > FastBit-users mailing list > > > [email protected] > <mailto:[email protected]> > <mailto:[email protected] > <mailto:[email protected]>> > > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > > > > > > > > _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
