Seems like I am missing some input here. Are we talking about changing the behavior of DrillBuf#capacity()? If so, I would +1 the following:
*- It seems like capacity should return the length of the slice (since I believe that fits with the general ByteBuf interface). If I reset writerIndex(0), I should be able write to capacity() without issue.* 2015-11-20 16:03 GMT-08:00 Jacques Nadeau <[email protected]>: > My gut is that any changes other than correct accounting will have little > impact on real world situations. Do you think the change will make enough > difference to be valuable? > > It seems like capacity should return the length of the slice (since I > believe that fits with the general ByteBuf interface). If I reset > writerIndex(0), I should be able write to capacity() without issue. Seems > weird to return some other (mostly disconnect) value. > > > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Fri, Nov 20, 2015 at 3:28 PM, Abdel Hakim Deneche < > [email protected]> > wrote: > > > @Steven, I think DrillBuf.capacity() was changed at some point, I was > > looking at the code and it seems to only return the size of the "slice" > and > > not the root buffer. > > > > While waiting for the new allocator and transfer of ownership, would it > > make sense to remove the check for root buffers like this ? > > > > private long getBufferSize(VectorAccessible batch) { > > > long size = 0; > > > for (VectorWrapper<?> w : batch) { > > > DrillBuf[] bufs = w.getValueVector().getBuffers(false); > > > for (DrillBuf buf : bufs) { > > > size += buf.capacity(); > > > } > > > } > > > return size; > > > } > > > > > > > > On Fri, Nov 20, 2015 at 2:50 PM, Hanifi GUNES <[email protected]> > > wrote: > > > > > Problem with the above code is that not all vectors operate on root > > buffers > > > rendering the accounting above inaccurate. In fact your example is one > > > perfect instance where vectors would point to non-root buffers for sure > > > because of the slicing taking place at RecordBatchLoader#load [1] > > > > > > > > > 1: > > > > > > > > > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchLoader.java#L117 > > > > > > 2015-11-20 11:41 GMT-08:00 Steven Phillips <[email protected]>: > > > > > > > I think it is because we can't actually properly account for sliced > > > > buffers. I don't remember for sure, but I think it might be because > > > calling > > > > buf.capacity() on a sliced buffer returns the the capacity of root > > > buffer, > > > > not the size of the slice. That may not be correct, but I think it > was > > > > something like that. Whatever it is, I am pretty sure it was giving > > wrong > > > > results when they are sliced buffers. > > > > > > > > I think we need to get the new allocator, along with proper transfer > of > > > > ownership in order to do this correctly. Then we can just query the > > > > allocator rather than trying to track it separately. > > > > > > > > On Fri, Nov 20, 2015 at 11:25 AM, Abdel Hakim Deneche < > > > > [email protected] > > > > > wrote: > > > > > > > > > I'm looking at the external sort code and it uses the following > > method > > > to > > > > > compute the allocated size of a batch: > > > > > > > > > > private long getBufferSize(VectorAccessible batch) { > > > > > > long size = 0; > > > > > > for (VectorWrapper<?> w : batch) { > > > > > > DrillBuf[] bufs = w.getValueVector().getBuffers(false); > > > > > > for (DrillBuf buf : bufs) { > > > > > > if (*buf.isRootBuffer()*) { > > > > > > size += buf.capacity(); > > > > > > } > > > > > > } > > > > > > } > > > > > > return size; > > > > > > } > > > > > > > > > > > > > > > This method only accounts for root buffers, but when we have a > > receiver > > > > > below the sort, most of (if not all) buffers are child buffers. > This > > > may > > > > > delay spilling, and increase the memory usage of the drillbit. If > my > > > > > computations are correct, for a single query, one drillbit can > > allocate > > > > up > > > > > to 40GB without spilling once to disk. > > > > > > > > > > Is there a specific reason we only account for root buffers ? > > > > > > > > > > -- > > > > > > > > > > Abdelhakim Deneche > > > > > > > > > > Software Engineer > > > > > > > > > > <http://www.mapr.com/> > > > > > > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > > > < > > > > > > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Abdelhakim Deneche > > > > Software Engineer > > > > <http://www.mapr.com/> > > > > > > Now Available - Free Hadoop On-Demand Training > > < > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > >
