[ 
https://issues.apache.org/jira/browse/PIG-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539208#comment-14539208
 ] 

Mohit Sabharwal commented on PIG-4542:
--------------------------------------

Thanks, [~kellyzly]

1) Fixed parent reference in SparkPlan physical operators.
2) You're right, POStreamSpark was handling (potentially multiple) last 
buffered records, not necessarily the last record.

> OutputConsumerIterator should flush buffered records
> ----------------------------------------------------
>
>                 Key: PIG-4542
>                 URL: https://issues.apache.org/jira/browse/PIG-4542
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>    Affects Versions: spark-branch
>            Reporter: Mohit Sabharwal
>            Assignee: Mohit Sabharwal
>             Fix For: spark-branch
>
>         Attachments: PIG-4542.1.patch, PIG-4542.patch
>
>
> Certain operators may buffer the output. We need to flush the last set of 
> records from such operators, when we encounter the last input record, before 
> calling getNextTuple() for the last time.
> Currently, to flush the last set of records, we compute RDD.count() and 
> compare the count with a running counter to determine if we have reached the 
> last record. This is an unnecessary and inefficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to