[ 
https://issues.apache.org/jira/browse/PIG-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261683#comment-15261683
 ] 

liyunzhang_intel commented on PIG-4876:
---------------------------------------

[~mohitsabharwal]:  Have you decided which ways is better? 1. add a flag 
endOfInput for the PhysicalOperators like POCollectedGroup, POMergeJoin, 
POMergeCogroup to verify it reaches to the end for current operator like what 
PIG-4876.patch did.   2. in xxxConverter#getNextResults(), we set  
poCollectedGroup.getParentPlan().endOfAllInput  as true only when  
input.hasNext() is false like what PIG-4842 did:
{code}
   @Override
                        protected Result getNextResult() throws ExecException {

                            // if endOfAllInput was set as true by the 
predecessors, but input.hasNext() is true.
                            // it means that the predecessor has consumed all 
of its input,
                            // but poCollectedGroup still hasn't consumed all 
of its input.
                            // set endOfAllInput as false here, so that 
POCollectedGroup.getNextTuple() can work correctly

                            if (poCollectedGroup.getParentPlan().endOfAllInput 
&& input.hasNext()) {
                                poCollectedGroup.getParentPlan().endOfAllInput 
= false;
                            }

                            return poCollectedGroup.getNextTuple();
                        }
{code}

Method1 is more code readable but modify the original code. Method2 does not 
modify the code but may be tricky for other one to understand the 
getParentPlan().endOfAllInput( i think beginOfInput() is more complex).

Please give us your suggestion.



> OutputConsumeIterator can't handle the last buffered tuples for some Operators
> ------------------------------------------------------------------------------
>
>                 Key: PIG-4876
>                 URL: https://issues.apache.org/jira/browse/PIG-4876
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Xianda Ke
>            Assignee: Xianda Ke
>             Fix For: spark-branch
>
>         Attachments: PIG-4876.patch
>
>
> Some Operators, such as MergeCogroup, Stream, CollectedGroup etc buffer some 
> input records to constitute the result tuples. The last result tuples are 
> buffered in the operator.  These Operators need a flag to indicate the end of 
> input, so that they can flush and constitute their last tuples.
> Currently, the flag 'parentPlan.endOfAllInput' is targeted for flushing the 
> buffered tuples in MR mode.  But it does not work with OutputConsumeIterator 
> in Spark mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to