[
https://issues.apache.org/jira/browse/PIG-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261683#comment-15261683
]
liyunzhang_intel commented on PIG-4876:
---------------------------------------
[~mohitsabharwal]: Have you decided which ways is better? 1. add a flag
endOfInput for the PhysicalOperators like POCollectedGroup, POMergeJoin,
POMergeCogroup to verify it reaches to the end for current operator like what
PIG-4876.patch did. 2. in xxxConverter#getNextResults(), we set
poCollectedGroup.getParentPlan().endOfAllInput as true only when
input.hasNext() is false like what PIG-4842 did:
{code}
@Override
protected Result getNextResult() throws ExecException {
// if endOfAllInput was set as true by the
predecessors, but input.hasNext() is true.
// it means that the predecessor has consumed all
of its input,
// but poCollectedGroup still hasn't consumed all
of its input.
// set endOfAllInput as false here, so that
POCollectedGroup.getNextTuple() can work correctly
if (poCollectedGroup.getParentPlan().endOfAllInput
&& input.hasNext()) {
poCollectedGroup.getParentPlan().endOfAllInput
= false;
}
return poCollectedGroup.getNextTuple();
}
{code}
Method1 is more code readable but modify the original code. Method2 does not
modify the code but may be tricky for other one to understand the
getParentPlan().endOfAllInput( i think beginOfInput() is more complex).
Please give us your suggestion.
> OutputConsumeIterator can't handle the last buffered tuples for some Operators
> ------------------------------------------------------------------------------
>
> Key: PIG-4876
> URL: https://issues.apache.org/jira/browse/PIG-4876
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: Xianda Ke
> Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4876.patch
>
>
> Some Operators, such as MergeCogroup, Stream, CollectedGroup etc buffer some
> input records to constitute the result tuples. The last result tuples are
> buffered in the operator. These Operators need a flag to indicate the end of
> input, so that they can flush and constitute their last tuples.
> Currently, the flag 'parentPlan.endOfAllInput' is targeted for flushing the
> buffered tuples in MR mode. But it does not work with OutputConsumeIterator
> in Spark mode.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)