[
https://issues.apache.org/jira/browse/PIG-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036346#comment-14036346
]
Daniel Dai commented on PIG-4020:
---------------------------------
In short, processInput() in MR will not result endOfAllInput flag set, but in
Tez that's no longer true.
In MR, we run the pipeline once per key. During a particular key, we keep
pulling the bottom of the pipeline until see a EOP. endOfAllInput will not be
set during the process. In cleanup, we will set endOfAllInput flag and pull the
pipeline again.
In Tez, we run the pipeline once per task. During the process, we keep pulling
the input, and endOfAllInput will be set during our pulling.
So in MR, after processInput (POSplit:214), we don't need to check if
endOfAllInput is set or not, but in Tez, we need to check. If it is set, then
we need to pull till the pipeline becomes empty to finalize the data
processing, instead of simply return a EOP, which will result the loss of the
later part of data.
> Fix tez e2e tests MapPartialAgg_[2-4], StreamingPerformance_[6-7]
> -----------------------------------------------------------------
>
> Key: PIG-4020
> URL: https://issues.apache.org/jira/browse/PIG-4020
> Project: Pig
> Issue Type: Bug
> Components: tez
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4020-1.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)