Re: [FastBit-users] Problem with Fasbit and JNI integration

kishore g Thu, 28 Mar 2013 07:42:47 -0700

That means if we create data partitions and keep it on the same node, we
wont hit the memory problem we are seeing now ?


thanks,
Kishore G


On Thu, Mar 28, 2013 at 7:34 AM, K. John Wu <[email protected]> wrote:

> You can have different data partitions on the same computer node.  For
> simple operators like, SUM, MIN and MAX, FastBit could do the
> operations with only on pass through the data.  Of course, using more
> computer nodes will reduce the overall execution time.
>
> john
>
>
> On 3/28/13 7:31 AM, kishore g wrote:
> > Thanks John, I agree partitioning will help us and be much faster but
> > give our total foot print we want to see what is the max data we can
> > have one one node. We will try the maxBytes approach and let you know
> > how it goes.
> >
> > Just want to clarify, when you say partition do you mean keep the
> > partitions on different boxes or its ok if the partitions are on the
> > same box?
> >
> > thanks,
> > Kishore G
> >
> >
> > On Wed, Mar 27, 2013 at 11:27 PM, K. John Wu <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     Hi, Kishore,
> >
> >     The easiest way to evaluate sum(v1) with a subset of v1 values is to
> >     read the values into memory and then perform the summation.  One
> could
> >     attempt to use the bitmap index for v1, however, unless the index
> size
> >     is really small, simply reading the values of v1 is faster.
> >
> >     Given that FastBit most likely has to read the values of 200 columns,
> >     the maximum memory required is to read every value into memory.
> >
> >     It is definitely possible to use memory map and let the OS handle the
> >     paging, no matter how it is done, it is still paging.  You can tell
> >     FastBit to pretend there is more than 16GB of memory by setting
> >     parameter maxBytes to larger value.  The bottom line is that you will
> >     be paging, paging is never fast.
> >
> >     My believe is that you can do a lot better by partitioning the data
> >     and avoiding the paging.  It should be worth your time to give it
> >     a try.
> >
> >     John
> >
> >
> >     On 3/27/13 9:49 PM, kishore g wrote:
> >     > Hi John
> >     >
> >     > Thanks for the quick response. Why should they be in memory. I
> >     thought
> >     > the bit map indexes would be used first to do the intersection.
> Once
> >     > it applies the predicates it would then scan the metric data and
> >     > compute the aggregation. All data would be mmapped and the kernel
> >     > would deal with swapping pages in/out
> >     >
> >     > I expected that perf would suffer if memory is less did not
> >     anticipate
> >     > that functionality would be impacted
> >     >
> >     > Thanks
> >     > Kishore G
> >     >
> >     > On Mar 27, 2013 8:14 PM, "K. John Wu" <[email protected]
> >     <mailto:[email protected]>
> >     > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >     >
> >     >     Hi, Aditya,
> >     >
> >     >     Looks like you have run out of memory.  I would suggest that
> >     you break
> >     >     up the data set into multiple partitions, where each
> >     partition has a
> >     >     subset of the rows.  For a conservative estimate, you use the
> >     >     following,
> >     >
> >     >     8 bytes (per value, or whatever size is you actual data) *
> >     200 columns
> >     >     * N rows
> >     >
> >     >     If you have 16GB of memory, then you can have at most 10
> >     million rows
> >     >     in memory at any given time.  In this case, you might want
> >     to limit
> >     >     your data partition size to be 5 million rows.
> >     >
> >     >     Good luck.
> >     >
> >     >     John
> >     >
> >     >
> >     >     On 3/27/13 5:05 PM, Aditya Ramesh wrote:
> >     >     > Hi,
> >     >     >
> >     >     > When I try running a query on a large dataset using the
> >     ibis command
> >     >     > line tool, the query executes successfully. However, when
> >     I tried
> >     >     > using the JNI implementation from
> >     >     > https://bitbucket.org/olafW/fastbit4java/src, the query
> >     does not
> >     >     > succeed and it outputs a bunch of messages of the form:
> >     >     > Error -- fileManager::storage failed to malloc 1,423,308
> >     bytes of
> >     >     > storage on retry
> >     >     > Error -- fileManager::storage failed to malloc 2,846,616
> >     bytes of
> >     >     > storage on retry
> >     >     > Error -- bundles::ctor received an exception, start
> >     cleaning up
> >     >     >
> >     >     > Eventually, after some time, the program crashes
> >     completely with
> >     >     the message:
> >     >     > #
> >     >     > # There is insufficient memory for the Java Runtime
> >     Environment
> >     >     to continue.
> >     >     > # Native memory allocation (malloc) failed to allocate 32744
> >     >     bytes for
> >     >     > ChunkPool::allocate
> >     >     > # An error report file with more information is saved as:
> >     >     > # /export/home/eng/aramesh/LixCluster/hs_err_pid23388.log
> >     >     >
> >     >     > MyJVM args are "-server -Xms16384m -Xmx16384m
> >     -XX:PermSize=128M
> >     >     > -XX:MaxPermSize=128M -XX:NewSize=768m -XX:MaxNewSize=768m
> >     >     > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=85
> >     >     > -XX:+AlwaysPreTouch -XX:+PrintGCApplicationStoppedTime
> >     >     > -XX:+PrintGCTimeStamps -XX:+UseCompressedOops
> >     >     > -XX:+ParallelRefProcEnabled -XX:+PrintGCDetails
> >     >     -XX:+PrintGCDateStamps
> >     >     > -XX:+PrintTenuringDistribution -Xloggc:LixLogs/gc.log
> >     >     > -XX:ErrorFile=logs/hs_err.log -Djava.awt.headless=true
> >     >     > -Dcom.sun.management.jmxremote
> >     -XX:+HeapDumpOnOutOfMemoryError"
> >     >     >
> >     >     > The query is of the following form: SUM(m1), SUM(m2), ....
> >     SUM(m100)
> >     >     > where a = 'a1' and b = 'b1' (i.e, only one row should be
> >     returned
> >     >     > although there can be a large number of columns (from
> >     60-200) to
> >     >     > retrieve). The query also only contains of conjunctions.
> >     >     >
> >     >     > Is there any way of resolving this problem with JNI?
> >     >     >
> >     >     > Thanks,
> >     >     > Aditya
> >     >     > _______________________________________________
> >     >     > FastBit-users mailing list
> >     >     > [email protected]
> >     <mailto:[email protected]>
> >     <mailto:[email protected]
> >     <mailto:[email protected]>>
> >     >     >
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> >     >     >
> >     >     _______________________________________________
> >     >     FastBit-users mailing list
> >     >     [email protected]
> >     <mailto:[email protected]>
> >     <mailto:[email protected]
> >     <mailto:[email protected]>>
> >     >     https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> >     >
> >     >
> >     >
> >     > _______________________________________________
> >     > FastBit-users mailing list
> >     > [email protected] <mailto:[email protected]>
> >     > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> >     >
> >
> >
>

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Re: [FastBit-users] Problem with Fasbit and JNI integration

Reply via email to