[jira] [Commented] (PIG-4542) OutputConsumerIterator should flush buffered records

liyunzhang_intel (JIRA) Mon, 11 May 2015 17:18:12 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538945#comment-14538945
 ]


liyunzhang_intel commented on PIG-4542:
---------------------------------------

Thanks [~mohitsabharwal]:
  the questions i don't understand from your patch and comment are:
1. poCollectedGroup.getPlans().get(0) equals poCollectedGroup.parentPlan? (see 
review board)
2.  {quote}
      Deletes POStreamSpark since it was just handling the last record. 
     {quote} 
 can you explain why POStreamSpark handles the last record? 

> OutputConsumerIterator should flush buffered records
> ----------------------------------------------------
>
>                 Key: PIG-4542
>                 URL: https://issues.apache.org/jira/browse/PIG-4542
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>    Affects Versions: spark-branch
>            Reporter: Mohit Sabharwal
>            Assignee: Mohit Sabharwal
>             Fix For: spark-branch
>
>         Attachments: PIG-4542.1.patch, PIG-4542.patch
>
>
> Certain operators may buffer the output. We need to flush the last set of 
> records from such operators, when we encounter the last input record, before 
> calling getNextTuple() for the last time.
> Currently, to flush the last set of records, we compute RDD.count() and 
> compare the count with a running counter to determine if we have reached the 
> last record. This is an unnecessary and inefficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4542) OutputConsumerIterator should flush buffered records

Reply via email to