Ok, thanks. I will add a comment to the JIRA and assign it to you ;) On Tue, Dec 15, 2015 at 12:02 PM, Amit Hadke <[email protected]> wrote:
> Yup that may be it. I'll add an option to not hold on to left side iterator > batches. > > On Tue, Dec 15, 2015 at 11:56 AM, Abdel Hakim Deneche < > [email protected] > > wrote: > > > RecordIterator.mark() is only called for the right side of the merge > join. > > How about the left side, de we ever release the batches on the left side > ? > > In 4190 the sort that runs out of memory is on the left side of the > merge. > > > > On Tue, Dec 15, 2015 at 11:51 AM, Abdel Hakim Deneche < > > [email protected] > > > wrote: > > > > > I see, it's in RecordIterator.mark() > > > > > > On Tue, Dec 15, 2015 at 11:50 AM, Abdel Hakim Deneche < > > > [email protected]> wrote: > > > > > >> Amit, > > >> > > >> thanks for the prompt answer. Can you point me, in the code, where the > > >> purge is done ? > > >> > > >> > > >> > > >> On Tue, Dec 15, 2015 at 11:42 AM, Amit Hadke <[email protected]> > > >> wrote: > > >> > > >>> Hi Hakim, > > >>> RecordIterator will not hold all batches in memory. It holds batches > > from > > >>> last mark() operation. > > >>> It will purge batches as join moves along. > > >>> > > >>> Worst case case is when there are lots of repeating values on right > > side > > >>> which iterator will hold in memory. > > >>> > > >>> ~ Amit. > > >>> > > >>> On Tue, Dec 15, 2015 at 11:23 AM, Abdel Hakim Deneche < > > >>> [email protected] > > >>> > wrote: > > >>> > > >>> > Amit, > > >>> > > > >>> > I am looking at DRILL-4190 where one of the sort operators is > hitting > > >>> it's > > >>> > allocator limit when it's sending data downstream. This generally > > >>> happen > > >>> > when a downstream operator is holding those batches in memory (e.g. > > >>> Window > > >>> > Operator). > > >>> > > > >>> > The same query is running fine on 1.2.0 which seems to suggest that > > the > > >>> > recent changes to MergeJoinBatch "may" be causing the issue. > > >>> > > > >>> > It looks like RecordIterator is holding all incoming batches into a > > >>> > TreeRangeMap and if I'm not mistaken it doesn't release anything > > until > > >>> it's > > >>> > closed. Is this correct ? > > >>> > > > >>> > I am not familiar with how merge join used to work before > > >>> RecordIterator. > > >>> > Was it also the case that we hold all incoming batches in memory ? > > >>> > > > >>> > Thanks > > >>> > > > >>> > -- > > >>> > > > >>> > Abdelhakim Deneche > > >>> > > > >>> > Software Engineer > > >>> > > > >>> > <http://www.mapr.com/> > > >>> > > > >>> > > > >>> > Now Available - Free Hadoop On-Demand Training > > >>> > < > > >>> > > > >>> > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > >>> > > > > >>> > > > >>> > > >> > > >> > > >> > > >> -- > > >> > > >> Abdelhakim Deneche > > >> > > >> Software Engineer > > >> > > >> <http://www.mapr.com/> > > >> > > >> > > >> Now Available - Free Hadoop On-Demand Training > > >> < > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > >> > > > > > > > > > > > > -- > > > > > > Abdelhakim Deneche > > > > > > Software Engineer > > > > > > <http://www.mapr.com/> > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > < > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > -- > > > > Abdelhakim Deneche > > > > Software Engineer > > > > <http://www.mapr.com/> > > > > > > Now Available - Free Hadoop On-Demand Training > > < > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > -- Abdelhakim Deneche Software Engineer <http://www.mapr.com/> Now Available - Free Hadoop On-Demand Training <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
