Re: Suspicious direct memory consumption when running queries concurrently

Abdel Hakim Deneche Mon, 27 Jul 2015 12:46:49 -0700

On Mon, Jul 27, 2015 at 12:22 PM, Chris Westin <chriswesti...@gmail.com>
wrote:


> The graph does look like it's reached some kind of asymptote -- is that
> true, does it stop increasing at that point?
>

I will need to run the tests one more time to be sure, but if I remember
correctly, further runs got actually higher.


> If it is, there's still the question of why it's so high.
>

Yes especially that the queries we are running use relatively small
datasets, I think 10k rows split among lot's of (small ?) parquet files.


> Parth: yes, we're handing these buffers off from the RPC thread that
> receives them to a worker thread that works on them. As Jacques mentions,
> or pools should be dying off, but its the RPC pool that we may need to look
> at more closely. What are its characteristics? If Netty hangs on to buffers
> to recycle them, there's a pool per thread, which might explain the
> apparent asymptotic approach I see in this graph: we've finally reached the
> limit that can be cached per-thread for all the threads. Can we control the
> size of that pool, possibly reducing it in size, or at least making it
> smaller but more elastic (so it shrinks back down when the threads aren't
> in use)? That might be an easy experiment to do to see how the graph is
> affected.
>

Netty caches released buffers in a per thread cache, and trims those caches
every 8192 allocations, so it makes sense to have cached buffer at the end
of each query in the RPC threads (all other threads are eventually
destroyed and their caches cleared). But those buffers and cached should
reused later on, so why do we need to create that much more new chunks

>
> On Mon, Jul 27, 2015 at 11:04 AM, Jacques Nadeau <jacq...@dremio.com>
> wrote:
>
> > It sounds like your statement is that we're cacheing too many unused
> > chunks.  Hanifi and I previously discussed implementing a separate
> flushing
> > mechanism to release unallocated chunks that are hanging around.  The
> main
> > question is, why are so many chunks hanging around and what threads are
> > they associated with.  A Jmap dump and analysis should allow you to do
> > determine which thread owns the excess chunks.  My guess would be the RPC
> > pool since those are long lasting (as opposed to the WorkManager pool,
> > which is contracting).
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Mon, Jul 27, 2015 at 9:53 AM, Abdel Hakim Deneche <
> > adene...@maprtech.com>
> > wrote:
> >
> > > When running a set of, mostly window function, queries concurrently on
> a
> > > single drillbit with a 8GB max direct memory. We are seeing a
> continuous
> > > increase of direct memory allocation.
> > >
> > > We repeat the following steps multiple times:
> > > - we launch in "iteration" of tests that will run all queries in a
> random
> > > order, 10 queries at a time
> > > - after the iteration finishes, we wait for a couple of minute to give
> > > Drill time to release the memory being held by the finishing fragments
> > >
> > > Using Drill's memory logger ("drill.allocator") we were able to get
> > > snapshots of how memory was internally used by Netty, we only focused
> on
> > > the number of allocated chunks, if we take this number and multiply it
> by
> > > 16MB (netty's chunk size) we get approximately the same value reported
> by
> > > Drill's direct memory allocation.
> > > Here is a graph that shows the evolution of the number of allocated
> > chunks
> > > on a 500 iterations run (I'm working on improving the plots) :
> > >
> > > http://bit.ly/1JL6Kp3
> > >
> > > In this specific case, after the first iteration Drill was allocating
> > ~2GB
> > > of direct memory, this number kept rising after each iteration to ~6GB.
> > We
> > > suspect this caused one of our previous runs to crash the JVM.
> > >
> > > If we only focus on the log lines between iterations (when Drill's
> memory
> > > usage is below 10MB) then all allocated chunks are at most 2% usage. At
> > > some point we end up with 288 nearly empty chunks, yet the next
> iteration
> > > will cause more chunks to be allocated!!!
> > >
> > > is this expected ?
> > >
> > > PS: I am running more tests and will update this thread with more
> > > informations.
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Suspicious direct memory consumption when running queries concurrently

Reply via email to