Re: ExternalSort doesn't properly account for sliced buffers

Hanifi GUNES Fri, 20 Nov 2015 16:12:48 -0800

Seems like I am missing some input here. Are we talking about changing the
behavior of DrillBuf#capacity()? If so, I would +1 the following:


*- It seems like capacity should return the length of the slice (since
I believe that fits with the general ByteBuf interface). If I
reset writerIndex(0), I should be able write to capacity() without issue.*

2015-11-20 16:03 GMT-08:00 Jacques Nadeau <[email protected]>:

> My gut is that any changes other than correct accounting will have little
> impact on real world situations. Do you think the change will make enough
> difference to be valuable?
>
> It seems like capacity should return the length of the slice (since I
> believe that fits with the general ByteBuf interface). If I reset
> writerIndex(0), I should be able write to capacity() without issue. Seems
> weird to return some other (mostly disconnect) value.
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Fri, Nov 20, 2015 at 3:28 PM, Abdel Hakim Deneche <
> [email protected]>
> wrote:
>
> > @Steven, I think DrillBuf.capacity() was changed at some point, I was
> > looking at the code and it seems to only return the size of the "slice"
> and
> > not the root buffer.
> >
> > While waiting for the new allocator and transfer of ownership, would it
> > make sense to remove the check for root buffers like this ?
> >
> >   private long getBufferSize(VectorAccessible batch) {
> > >     long size = 0;
> > >     for (VectorWrapper<?> w : batch) {
> > >       DrillBuf[] bufs = w.getValueVector().getBuffers(false);
> > >       for (DrillBuf buf : bufs) {
> > >         size += buf.capacity();
> > >       }
> > >     }
> > >     return size;
> > >   }
> >
> >
> >
> > On Fri, Nov 20, 2015 at 2:50 PM, Hanifi GUNES <[email protected]>
> > wrote:
> >
> > > Problem with the above code is that not all vectors operate on root
> > buffers
> > > rendering the accounting above inaccurate. In fact your example is one
> > > perfect instance where vectors would point to non-root buffers for sure
> > > because of the slicing taking place at RecordBatchLoader#load [1]
> > >
> > >
> > > 1:
> > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchLoader.java#L117
> > >
> > > 2015-11-20 11:41 GMT-08:00 Steven Phillips <[email protected]>:
> > >
> > > > I think it is because we can't actually properly account for sliced
> > > > buffers. I don't remember for sure, but I think it might be because
> > > calling
> > > > buf.capacity() on a sliced buffer returns the the capacity of root
> > > buffer,
> > > > not the size of the slice. That may not be correct, but I think it
> was
> > > > something like that. Whatever it is, I am pretty sure it was giving
> > wrong
> > > > results when they are sliced buffers.
> > > >
> > > > I think we need to get the new allocator, along with proper transfer
> of
> > > > ownership in order to do this correctly. Then we can just query the
> > > > allocator rather than trying to track it separately.
> > > >
> > > > On Fri, Nov 20, 2015 at 11:25 AM, Abdel Hakim Deneche <
> > > > [email protected]
> > > > > wrote:
> > > >
> > > > > I'm looking at the external sort code and it uses the following
> > method
> > > to
> > > > > compute the allocated size of a batch:
> > > > >
> > > > >   private long getBufferSize(VectorAccessible batch) {
> > > > > >     long size = 0;
> > > > > >     for (VectorWrapper<?> w : batch) {
> > > > > >       DrillBuf[] bufs = w.getValueVector().getBuffers(false);
> > > > > >       for (DrillBuf buf : bufs) {
> > > > > >         if (*buf.isRootBuffer()*) {
> > > > > >           size += buf.capacity();
> > > > > >         }
> > > > > >       }
> > > > > >     }
> > > > > >     return size;
> > > > > >   }
> > > > >
> > > > >
> > > > > This method only accounts for root buffers, but when we have a
> > receiver
> > > > > below the sort, most of (if not all) buffers are child buffers.
> This
> > > may
> > > > > delay spilling, and increase the memory usage of the drillbit. If
> my
> > > > > computations are correct, for a single query, one drillbit can
> > allocate
> > > > up
> > > > > to 40GB without spilling once to disk.
> > > > >
> > > > > Is there a specific reason we only account for root buffers ?
> > > > >
> > > > > --
> > > > >
> > > > > Abdelhakim Deneche
> > > > >
> > > > > Software Engineer
> > > > >
> > > > >   <http://www.mapr.com/>
> > > > >
> > > > >
> > > > > Now Available - Free Hadoop On-Demand Training
> > > > > <
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>

Re: ExternalSort doesn't properly account for sliced buffers

Reply via email to