[
https://issues.apache.org/jira/browse/PIG-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253377#comment-15253377
]
Mohit Sabharwal commented on PIG-4876:
--------------------------------------
Thanks for the explanation [~kexianda]. Left a comment regarding naming on RB.
To summarize your explanation, since endOfAllInput is shared amongst all
operators in the plan, it may get set to true by a preceding operator, which
may affect subsequent operators in the plan (which may not have finished
processing all tuples). Is that correct ?
One question:
- After PIG-4542 patch (https://reviews.apache.org/r/34003), I see that
TestCollectedGroup was passing. What is different about usage of CollectedGroup
in PIG-4842 that it caused it to now fail ?
> OutputConsumeIterator can't handle the last buffered tuples for some Operators
> ------------------------------------------------------------------------------
>
> Key: PIG-4876
> URL: https://issues.apache.org/jira/browse/PIG-4876
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: Xianda Ke
> Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4876.patch
>
>
> Some Operators, such as MergeCogroup, Stream, CollectedGroup etc buffer some
> input records to constitute the result tuples. The last result tuples are
> buffered in the operator. These Operators need a flag to indicate the end of
> input, so that they can flush and constitute their last tuples.
> Currently, the flag 'parentPlan.endOfAllInput' is targeted for flushing the
> buffered tuples in MR mode. But it does not work with OutputConsumeIterator
> in Spark mode.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)