[ 
https://issues.apache.org/jira/browse/TEZ-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199423#comment-16199423
 ] 

Jacob Tolar commented on TEZ-3849:
----------------------------------

[~kshukla] --

"add hasNext to the original iterator" - yeah, that's the obvious solution but, 
as you said, it involves changes several other places. I thought this might be 
a little simpler. Some of the implementation changes are super straightforward 
but a couple would need a little scrutiny to make sure that hasNext() and 
next() play nice with each other (MergeQueue and PartitionFilter in particular).

* DefaultSorter.MRResultIterator
* PipelinedSorter.PartitionedRawKeyValueIterator
* PipelinedSorter.SpanIterator
* PipelinedSorter.PartitionFilter
* PipelinedSorter.SpanMerger
* TezMerger.MergeQueue
* TezMerger.EmptyIterator

If that's what you prefer let me know and I can send another patch.

Unfortunately "just reinitialize kvIter" doesn't work -- 
{{merger.filter(partition)}} just calls reset() on its inner iterator, which 
does nothing but reset the PartitionFilter partition number. The iterator 
PartitionFilter wraps will already have been advanced (and it may be marked 
dirty already, etc). I don't know this code well enough to say if that's a bug 
in reset() or not, I just java debugger'd my way here. :) See: 
https://github.com/apache/tez/blob/c82b2eadea9d75db4dd47a455c0b4d6f7dda2cc8/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/sort/impl/PipelinedSorter.java#L1234-L1276
 . 

> Combiner+PipelinedSorter silently drops records
> -----------------------------------------------
>
>                 Key: TEZ-3849
>                 URL: https://issues.apache.org/jira/browse/TEZ-3849
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jacob Tolar
>            Assignee: Jacob Tolar
>         Attachments: TEZ-3849.1.patch, TEZ-3849.2.patch
>
>
> This bug was introduced in 
> https://github.com/apache/tez/commit/a47e8fcbea5eeab5a7cf812271d329524cc02dba?diff=split
>  
> when combiner != null, the change in this commit passes kvIter with next() 
> having already been called. This ends up (silently) dropping the first record 
> in the partition. 
> Will submit PR and attach patch. [~jeagles], not sure if this is the way you 
> want to fix or not but it does fix my tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to