RecordIterator.mark() is only called for the right side of the merge join. How about the left side, de we ever release the batches on the left side ? In 4190 the sort that runs out of memory is on the left side of the merge.
On Tue, Dec 15, 2015 at 11:51 AM, Abdel Hakim Deneche <[email protected] > wrote: > I see, it's in RecordIterator.mark() > > On Tue, Dec 15, 2015 at 11:50 AM, Abdel Hakim Deneche < > [email protected]> wrote: > >> Amit, >> >> thanks for the prompt answer. Can you point me, in the code, where the >> purge is done ? >> >> >> >> On Tue, Dec 15, 2015 at 11:42 AM, Amit Hadke <[email protected]> >> wrote: >> >>> Hi Hakim, >>> RecordIterator will not hold all batches in memory. It holds batches from >>> last mark() operation. >>> It will purge batches as join moves along. >>> >>> Worst case case is when there are lots of repeating values on right side >>> which iterator will hold in memory. >>> >>> ~ Amit. >>> >>> On Tue, Dec 15, 2015 at 11:23 AM, Abdel Hakim Deneche < >>> [email protected] >>> > wrote: >>> >>> > Amit, >>> > >>> > I am looking at DRILL-4190 where one of the sort operators is hitting >>> it's >>> > allocator limit when it's sending data downstream. This generally >>> happen >>> > when a downstream operator is holding those batches in memory (e.g. >>> Window >>> > Operator). >>> > >>> > The same query is running fine on 1.2.0 which seems to suggest that the >>> > recent changes to MergeJoinBatch "may" be causing the issue. >>> > >>> > It looks like RecordIterator is holding all incoming batches into a >>> > TreeRangeMap and if I'm not mistaken it doesn't release anything until >>> it's >>> > closed. Is this correct ? >>> > >>> > I am not familiar with how merge join used to work before >>> RecordIterator. >>> > Was it also the case that we hold all incoming batches in memory ? >>> > >>> > Thanks >>> > >>> > -- >>> > >>> > Abdelhakim Deneche >>> > >>> > Software Engineer >>> > >>> > <http://www.mapr.com/> >>> > >>> > >>> > Now Available - Free Hadoop On-Demand Training >>> > < >>> > >>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available >>> > > >>> > >>> >> >> >> >> -- >> >> Abdelhakim Deneche >> >> Software Engineer >> >> <http://www.mapr.com/> >> >> >> Now Available - Free Hadoop On-Demand Training >> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available> >> > > > > -- > > Abdelhakim Deneche > > Software Engineer > > <http://www.mapr.com/> > > > Now Available - Free Hadoop On-Demand Training > <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available> > -- Abdelhakim Deneche Software Engineer <http://www.mapr.com/> Now Available - Free Hadoop On-Demand Training <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
