[ 
https://issues.apache.org/jira/browse/PIG-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537488#comment-14537488
 ] 

liyunzhang_intel commented on PIG-4542:
---------------------------------------

[~mohitsabharwal]:
  Thanks for your patch.  But i have not seen following in PIG-4542.patch:  
have you submmitted the corrected patch?
1. Gets rid of the use of RDD.count() in CollectedGroupConverter and 
StreamConverter.


> OutputConsumerIterator should flush buffered records
> ----------------------------------------------------
>
>                 Key: PIG-4542
>                 URL: https://issues.apache.org/jira/browse/PIG-4542
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>    Affects Versions: spark-branch
>            Reporter: Mohit Sabharwal
>            Assignee: Mohit Sabharwal
>             Fix For: spark-branch
>
>         Attachments: PIG-4542.patch
>
>
> Certain operators may buffer the output. We need to flush the last set of 
> records from such operators, when we encounter the last input record, before 
> calling getNextTuple() for the last time.
> Currently, to flush the last set of records, we compute RDD.count() and 
> compare the count with a running counter to determine if we have reached the 
> last record. This is an unnecessary and inefficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to