Re: [FastBit-users] Problem with Fasbit and JNI integration

K. John Wu Thu, 28 Mar 2013 07:50:22 -0700

Yes.

John


On 3/28/13 7:42 AM, kishore g wrote:
> That means if we create data partitions and keep it on the same node,
> we wont hit the memory problem we are seeing now ?
> 
> thanks,
> Kishore G
> 
> 
> On Thu, Mar 28, 2013 at 7:34 AM, K. John Wu <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     You can have different data partitions on the same computer node.  For
>     simple operators like, SUM, MIN and MAX, FastBit could do the
>     operations with only on pass through the data.  Of course, using more
>     computer nodes will reduce the overall execution time.
> 
>     john
> 
> 
>     On 3/28/13 7:31 AM, kishore g wrote:
>     > Thanks John, I agree partitioning will help us and be much
>     faster but
>     > give our total foot print we want to see what is the max data we can
>     > have one one node. We will try the maxBytes approach and let you
>     know
>     > how it goes.
>     >
>     > Just want to clarify, when you say partition do you mean keep the
>     > partitions on different boxes or its ok if the partitions are on the
>     > same box?
>     >
>     > thanks,
>     > Kishore G
>     >
>     >
>     > On Wed, Mar 27, 2013 at 11:27 PM, K. John Wu <[email protected]
>     <mailto:[email protected]>
>     > <mailto:[email protected] <mailto:[email protected]>>> wrote:
>     >
>     >     Hi, Kishore,
>     >
>     >     The easiest way to evaluate sum(v1) with a subset of v1
>     values is to
>     >     read the values into memory and then perform the summation.
>      One could
>     >     attempt to use the bitmap index for v1, however, unless the
>     index size
>     >     is really small, simply reading the values of v1 is faster.
>     >
>     >     Given that FastBit most likely has to read the values of 200
>     columns,
>     >     the maximum memory required is to read every value into memory.
>     >
>     >     It is definitely possible to use memory map and let the OS
>     handle the
>     >     paging, no matter how it is done, it is still paging.  You
>     can tell
>     >     FastBit to pretend there is more than 16GB of memory by setting
>     >     parameter maxBytes to larger value.  The bottom line is that
>     you will
>     >     be paging, paging is never fast.
>     >
>     >     My believe is that you can do a lot better by partitioning
>     the data
>     >     and avoiding the paging.  It should be worth your time to
>     give it
>     >     a try.
>     >
>     >     John
>     >
>     >
>     >     On 3/27/13 9:49 PM, kishore g wrote:
>     >     > Hi John
>     >     >
>     >     > Thanks for the quick response. Why should they be in memory. I
>     >     thought
>     >     > the bit map indexes would be used first to do the
>     intersection. Once
>     >     > it applies the predicates it would then scan the metric
>     data and
>     >     > compute the aggregation. All data would be mmapped and the
>     kernel
>     >     > would deal with swapping pages in/out
>     >     >
>     >     > I expected that perf would suffer if memory is less did not
>     >     anticipate
>     >     > that functionality would be impacted
>     >     >
>     >     > Thanks
>     >     > Kishore G
>     >     >
>     >     > On Mar 27, 2013 8:14 PM, "K. John Wu" <[email protected]
>     <mailto:[email protected]>
>     >     <mailto:[email protected] <mailto:[email protected]>>
>     >     > <mailto:[email protected] <mailto:[email protected]>
>     <mailto:[email protected] <mailto:[email protected]>>>> wrote:
>     >     >
>     >     >     Hi, Aditya,
>     >     >
>     >     >     Looks like you have run out of memory.  I would
>     suggest that
>     >     you break
>     >     >     up the data set into multiple partitions, where each
>     >     partition has a
>     >     >     subset of the rows.  For a conservative estimate, you
>     use the
>     >     >     following,
>     >     >
>     >     >     8 bytes (per value, or whatever size is you actual data) *
>     >     200 columns
>     >     >     * N rows
>     >     >
>     >     >     If you have 16GB of memory, then you can have at most 10
>     >     million rows
>     >     >     in memory at any given time.  In this case, you might want
>     >     to limit
>     >     >     your data partition size to be 5 million rows.
>     >     >
>     >     >     Good luck.
>     >     >
>     >     >     John
>     >     >
>     >     >
>     >     >     On 3/27/13 5:05 PM, Aditya Ramesh wrote:
>     >     >     > Hi,
>     >     >     >
>     >     >     > When I try running a query on a large dataset using the
>     >     ibis command
>     >     >     > line tool, the query executes successfully. However,
>     when
>     >     I tried
>     >     >     > using the JNI implementation from
>     >     >     > https://bitbucket.org/olafW/fastbit4java/src, the query
>     >     does not
>     >     >     > succeed and it outputs a bunch of messages of the form:
>     >     >     > Error -- fileManager::storage failed to malloc 1,423,308
>     >     bytes of
>     >     >     > storage on retry
>     >     >     > Error -- fileManager::storage failed to malloc 2,846,616
>     >     bytes of
>     >     >     > storage on retry
>     >     >     > Error -- bundles::ctor received an exception, start
>     >     cleaning up
>     >     >     >
>     >     >     > Eventually, after some time, the program crashes
>     >     completely with
>     >     >     the message:
>     >     >     > #
>     >     >     > # There is insufficient memory for the Java Runtime
>     >     Environment
>     >     >     to continue.
>     >     >     > # Native memory allocation (malloc) failed to
>     allocate 32744
>     >     >     bytes for
>     >     >     > ChunkPool::allocate
>     >     >     > # An error report file with more information is
>     saved as:
>     >     >     > #
>     /export/home/eng/aramesh/LixCluster/hs_err_pid23388.log
>     >     >     >
>     >     >     > MyJVM args are "-server -Xms16384m -Xmx16384m
>     >     -XX:PermSize=128M
>     >     >     > -XX:MaxPermSize=128M -XX:NewSize=768m
>     -XX:MaxNewSize=768m
>     >     >     > -XX:+UseConcMarkSweepGC
>     -XX:CMSInitiatingOccupancyFraction=85
>     >     >     > -XX:+AlwaysPreTouch -XX:+PrintGCApplicationStoppedTime
>     >     >     > -XX:+PrintGCTimeStamps -XX:+UseCompressedOops
>     >     >     > -XX:+ParallelRefProcEnabled -XX:+PrintGCDetails
>     >     >     -XX:+PrintGCDateStamps
>     >     >     > -XX:+PrintTenuringDistribution -Xloggc:LixLogs/gc.log
>     >     >     > -XX:ErrorFile=logs/hs_err.log -Djava.awt.headless=true
>     >     >     > -Dcom.sun.management.jmxremote
>     >     -XX:+HeapDumpOnOutOfMemoryError"
>     >     >     >
>     >     >     > The query is of the following form: SUM(m1),
>     SUM(m2), ....
>     >     SUM(m100)
>     >     >     > where a = 'a1' and b = 'b1' (i.e, only one row should be
>     >     returned
>     >     >     > although there can be a large number of columns (from
>     >     60-200) to
>     >     >     > retrieve). The query also only contains of conjunctions.
>     >     >     >
>     >     >     > Is there any way of resolving this problem with JNI?
>     >     >     >
>     >     >     > Thanks,
>     >     >     > Aditya
>     >     >     > _______________________________________________
>     >     >     > FastBit-users mailing list
>     >     >     > [email protected]
>     <mailto:[email protected]>
>     >     <mailto:[email protected]
>     <mailto:[email protected]>>
>     >     <mailto:[email protected]
>     <mailto:[email protected]>
>     >     <mailto:[email protected]
>     <mailto:[email protected]>>>
>     >     >     >
>     https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>     >     >     >
>     >     >     _______________________________________________
>     >     >     FastBit-users mailing list
>     >     >     [email protected]
>     <mailto:[email protected]>
>     >     <mailto:[email protected]
>     <mailto:[email protected]>>
>     >     <mailto:[email protected]
>     <mailto:[email protected]>
>     >     <mailto:[email protected]
>     <mailto:[email protected]>>>
>     >     >    
>     https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>     >     >
>     >     >
>     >     >
>     >     > _______________________________________________
>     >     > FastBit-users mailing list
>     >     > [email protected]
>     <mailto:[email protected]>
>     <mailto:[email protected]
>     <mailto:[email protected]>>
>     >     > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>     >     >
>     >
>     >
> 
> 
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Re: [FastBit-users] Problem with Fasbit and JNI integration

Reply via email to