Re: Suspicious direct memory consumption when running queries concurrently

Jacques Nadeau Fri, 31 Jul 2015 21:51:07 -0700

For the memory leak,  not the jmap issue.
On Jul 31, 2015 9:50 PM, "Jacques Nadeau" <jacq...@dremio.com> wrote:


> Can you give me a single node repro?
> On Jul 31, 2015 9:20 PM, "Abdel Hakim Deneche" <adene...@maprtech.com>
> wrote:
>
>> I tried getting a jmap dump multiple times without success, each time it
>> crashes the jvm with the following exception:
>>
>> Dumping heap to /home/mapr/private-sql-hadoop-test/framework/myfile.hprof
>> > ...
>> > Exception in thread "main" java.io.IOException: Premature EOF
>> >         at
>> >
>> sun.tools.attach.HotSpotVirtualMachine.readInt(HotSpotVirtualMachine.java:248)
>> >         at
>> >
>> sun.tools.attach.LinuxVirtualMachine.execute(LinuxVirtualMachine.java:199)
>> >         at
>> >
>> sun.tools.attach.HotSpotVirtualMachine.executeCommand(HotSpotVirtualMachine.java:217)
>> >         at
>> >
>> sun.tools.attach.HotSpotVirtualMachine.dumpHeap(HotSpotVirtualMachine.java:180)
>> >         at sun.tools.jmap.JMap.dump(JMap.java:242)
>> >         at sun.tools.jmap.JMap.main(JMap.java:140)
>>
>>
>> On Mon, Jul 27, 2015 at 3:45 PM, Jacques Nadeau <jacq...@dremio.com>
>> wrote:
>>
>> > A allocate -> release cycle all on the same thread goes into a per
>> thread
>> > cache.
>> >
>> > A bunch of Netty arena settings are configurable.  The big issue I
>> believe
>> > is that the limits are soft limits implemented by the allocation-time
>> > release mechanism.  As such, if you allocate a bunch of memory, then
>> > release it all, that won't necessarily trigger any actual chunk
>> releases.
>> >
>> > --
>> > Jacques Nadeau
>> > CTO and Co-Founder, Dremio
>> >
>> > On Mon, Jul 27, 2015 at 12:47 PM, Abdel Hakim Deneche <
>> > adene...@maprtech.com
>> > > wrote:
>> >
>> > > @Jacques, my understanding is that chunks are not owned by specific a
>> > > thread but they are part of a specific memory arena which is in turn
>> only
>> > > accessed by specific threads. Do you want me to find which threads are
>> > > associated with the same arena where we have hanging chunks ?
>> > >
>> > >
>> > > On Mon, Jul 27, 2015 at 11:04 AM, Jacques Nadeau <jacq...@dremio.com>
>> > > wrote:
>> > >
>> > > > It sounds like your statement is that we're cacheing too many unused
>> > > > chunks.  Hanifi and I previously discussed implementing a separate
>> > > flushing
>> > > > mechanism to release unallocated chunks that are hanging around.
>> The
>> > > main
>> > > > question is, why are so many chunks hanging around and what threads
>> are
>> > > > they associated with.  A Jmap dump and analysis should allow you to
>> do
>> > > > determine which thread owns the excess chunks.  My guess would be
>> the
>> > RPC
>> > > > pool since those are long lasting (as opposed to the WorkManager
>> pool,
>> > > > which is contracting).
>> > > >
>> > > > --
>> > > > Jacques Nadeau
>> > > > CTO and Co-Founder, Dremio
>> > > >
>> > > > On Mon, Jul 27, 2015 at 9:53 AM, Abdel Hakim Deneche <
>> > > > adene...@maprtech.com>
>> > > > wrote:
>> > > >
>> > > > > When running a set of, mostly window function, queries
>> concurrently
>> > on
>> > > a
>> > > > > single drillbit with a 8GB max direct memory. We are seeing a
>> > > continuous
>> > > > > increase of direct memory allocation.
>> > > > >
>> > > > > We repeat the following steps multiple times:
>> > > > > - we launch in "iteration" of tests that will run all queries in a
>> > > random
>> > > > > order, 10 queries at a time
>> > > > > - after the iteration finishes, we wait for a couple of minute to
>> > give
>> > > > > Drill time to release the memory being held by the finishing
>> > fragments
>> > > > >
>> > > > > Using Drill's memory logger ("drill.allocator") we were able to
>> get
>> > > > > snapshots of how memory was internally used by Netty, we only
>> focused
>> > > on
>> > > > > the number of allocated chunks, if we take this number and
>> multiply
>> > it
>> > > by
>> > > > > 16MB (netty's chunk size) we get approximately the same value
>> > reported
>> > > by
>> > > > > Drill's direct memory allocation.
>> > > > > Here is a graph that shows the evolution of the number of
>> allocated
>> > > > chunks
>> > > > > on a 500 iterations run (I'm working on improving the plots) :
>> > > > >
>> > > > > http://bit.ly/1JL6Kp3
>> > > > >
>> > > > > In this specific case, after the first iteration Drill was
>> allocating
>> > > > ~2GB
>> > > > > of direct memory, this number kept rising after each iteration to
>> > ~6GB.
>> > > > We
>> > > > > suspect this caused one of our previous runs to crash the JVM.
>> > > > >
>> > > > > If we only focus on the log lines between iterations (when Drill's
>> > > memory
>> > > > > usage is below 10MB) then all allocated chunks are at most 2%
>> usage.
>> > At
>> > > > > some point we end up with 288 nearly empty chunks, yet the next
>> > > iteration
>> > > > > will cause more chunks to be allocated!!!
>> > > > >
>> > > > > is this expected ?
>> > > > >
>> > > > > PS: I am running more tests and will update this thread with more
>> > > > > informations.
>> > > > >
>> > > > > --
>> > > > >
>> > > > > Abdelhakim Deneche
>> > > > >
>> > > > > Software Engineer
>> > > > >
>> > > > >   <http://www.mapr.com/>
>> > > > >
>> > > > >
>> > > > > Now Available - Free Hadoop On-Demand Training
>> > > > > <
>> > > > >
>> > > >
>> > >
>> >
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > >
>> > > Abdelhakim Deneche
>> > >
>> > > Software Engineer
>> > >
>> > >   <http://www.mapr.com/>
>> > >
>> > >
>> > > Now Available - Free Hadoop On-Demand Training
>> > > <
>> > >
>> >
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>>
>> Abdelhakim Deneche
>>
>> Software Engineer
>>
>>   <http://www.mapr.com/>
>>
>>
>> Now Available - Free Hadoop On-Demand Training
>> <
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >
>>
>

Re: Suspicious direct memory consumption when running queries concurrently

Reply via email to