[
https://issues.apache.org/jira/browse/TEZ-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326938#comment-14326938
]
Rajesh Balamohan commented on TEZ-2001:
---------------------------------------
May be later, we can rely on higher level apps to provide details on whether it
would be deterministic/non-deterministic spills. But for current scope, should
we rely on non-deterministic spills?; In which case, we would be forced to
restart reducer.
This can have adverse impact when speculation is turned on. With speculation
some additional tasks (for stragglers) in the upstream could have been
scheduled and their partial spills will also be sent to the reducers. In the
older model, this would not be a problem as the consumers would start fetching
only when the complete data is available from upstream task.
And at the end of the upstream processing, some of the tasks will be in KILLED
state. So in order to maintain sanity, we need to discard the partial data
fetched from these KILLED tasks and the reducers would have to be restart in
most cases.
> Support pipelined data transfer for ordered output
> --------------------------------------------------
>
> Key: TEZ-2001
> URL: https://issues.apache.org/jira/browse/TEZ-2001
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-2001.1.patch, TEZ-2001.2.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)