Yup that may be it. I'll add an option to not hold on to left side iterator batches.
On Tue, Dec 15, 2015 at 11:56 AM, Abdel Hakim Deneche <[email protected] > wrote: > RecordIterator.mark() is only called for the right side of the merge join. > How about the left side, de we ever release the batches on the left side ? > In 4190 the sort that runs out of memory is on the left side of the merge. > > On Tue, Dec 15, 2015 at 11:51 AM, Abdel Hakim Deneche < > [email protected] > > wrote: > > > I see, it's in RecordIterator.mark() > > > > On Tue, Dec 15, 2015 at 11:50 AM, Abdel Hakim Deneche < > > [email protected]> wrote: > > > >> Amit, > >> > >> thanks for the prompt answer. Can you point me, in the code, where the > >> purge is done ? > >> > >> > >> > >> On Tue, Dec 15, 2015 at 11:42 AM, Amit Hadke <[email protected]> > >> wrote: > >> > >>> Hi Hakim, > >>> RecordIterator will not hold all batches in memory. It holds batches > from > >>> last mark() operation. > >>> It will purge batches as join moves along. > >>> > >>> Worst case case is when there are lots of repeating values on right > side > >>> which iterator will hold in memory. > >>> > >>> ~ Amit. > >>> > >>> On Tue, Dec 15, 2015 at 11:23 AM, Abdel Hakim Deneche < > >>> [email protected] > >>> > wrote: > >>> > >>> > Amit, > >>> > > >>> > I am looking at DRILL-4190 where one of the sort operators is hitting > >>> it's > >>> > allocator limit when it's sending data downstream. This generally > >>> happen > >>> > when a downstream operator is holding those batches in memory (e.g. > >>> Window > >>> > Operator). > >>> > > >>> > The same query is running fine on 1.2.0 which seems to suggest that > the > >>> > recent changes to MergeJoinBatch "may" be causing the issue. > >>> > > >>> > It looks like RecordIterator is holding all incoming batches into a > >>> > TreeRangeMap and if I'm not mistaken it doesn't release anything > until > >>> it's > >>> > closed. Is this correct ? > >>> > > >>> > I am not familiar with how merge join used to work before > >>> RecordIterator. > >>> > Was it also the case that we hold all incoming batches in memory ? > >>> > > >>> > Thanks > >>> > > >>> > -- > >>> > > >>> > Abdelhakim Deneche > >>> > > >>> > Software Engineer > >>> > > >>> > <http://www.mapr.com/> > >>> > > >>> > > >>> > Now Available - Free Hadoop On-Demand Training > >>> > < > >>> > > >>> > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > >>> > > > >>> > > >>> > >> > >> > >> > >> -- > >> > >> Abdelhakim Deneche > >> > >> Software Engineer > >> > >> <http://www.mapr.com/> > >> > >> > >> Now Available - Free Hadoop On-Demand Training > >> < > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > >> > > > > > > > > -- > > > > Abdelhakim Deneche > > > > Software Engineer > > > > <http://www.mapr.com/> > > > > > > Now Available - Free Hadoop On-Demand Training > > < > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > -- > > Abdelhakim Deneche > > Software Engineer > > <http://www.mapr.com/> > > > Now Available - Free Hadoop On-Demand Training > < > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > >
