Yup that may be it. I'll add an option to not hold on to left side iterator
batches.

On Tue, Dec 15, 2015 at 11:56 AM, Abdel Hakim Deneche <[email protected]
> wrote:

> RecordIterator.mark() is only called for the right side of the merge join.
> How about the left side, de we ever release the batches on the left side ?
> In 4190 the sort that runs out of memory is on the left side of the merge.
>
> On Tue, Dec 15, 2015 at 11:51 AM, Abdel Hakim Deneche <
> [email protected]
> > wrote:
>
> > I see, it's in RecordIterator.mark()
> >
> > On Tue, Dec 15, 2015 at 11:50 AM, Abdel Hakim Deneche <
> > [email protected]> wrote:
> >
> >> Amit,
> >>
> >> thanks for the prompt answer. Can you point me, in the code, where the
> >> purge is done ?
> >>
> >>
> >>
> >> On Tue, Dec 15, 2015 at 11:42 AM, Amit Hadke <[email protected]>
> >> wrote:
> >>
> >>> Hi Hakim,
> >>> RecordIterator will not hold all batches in memory. It holds batches
> from
> >>> last mark() operation.
> >>> It will purge batches as join moves along.
> >>>
> >>> Worst case case is when there are lots of repeating values on right
> side
> >>> which iterator will hold in memory.
> >>>
> >>> ~ Amit.
> >>>
> >>> On Tue, Dec 15, 2015 at 11:23 AM, Abdel Hakim Deneche <
> >>> [email protected]
> >>> > wrote:
> >>>
> >>> > Amit,
> >>> >
> >>> > I am looking at DRILL-4190 where one of the sort operators is hitting
> >>> it's
> >>> > allocator limit when it's sending data downstream. This generally
> >>> happen
> >>> > when a downstream operator is holding those batches in memory (e.g.
> >>> Window
> >>> > Operator).
> >>> >
> >>> > The same query is running fine on 1.2.0 which seems to suggest that
> the
> >>> > recent changes to MergeJoinBatch "may" be causing the issue.
> >>> >
> >>> > It looks like RecordIterator is holding all incoming batches into a
> >>> > TreeRangeMap and if I'm not mistaken it doesn't release anything
> until
> >>> it's
> >>> > closed. Is this correct ?
> >>> >
> >>> > I am not familiar with how merge join used to work before
> >>> RecordIterator.
> >>> > Was it also the case that we hold all incoming batches in memory ?
> >>> >
> >>> > Thanks
> >>> >
> >>> > --
> >>> >
> >>> > Abdelhakim Deneche
> >>> >
> >>> > Software Engineer
> >>> >
> >>> >   <http://www.mapr.com/>
> >>> >
> >>> >
> >>> > Now Available - Free Hadoop On-Demand Training
> >>> > <
> >>> >
> >>>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >>> > >
> >>> >
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> Abdelhakim Deneche
> >>
> >> Software Engineer
> >>
> >>   <http://www.mapr.com/>
> >>
> >>
> >> Now Available - Free Hadoop On-Demand Training
> >> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
> >>
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Reply via email to