[ 
https://issues.apache.org/jira/browse/TEZ-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326938#comment-14326938
 ] 

Rajesh Balamohan commented on TEZ-2001:
---------------------------------------

May be later, we can rely on higher level apps to provide details on whether it 
would be deterministic/non-deterministic spills.  But for current scope, should 
we rely on non-deterministic spills?; In which case, we would be forced to 
restart reducer.

This can have adverse impact when speculation is turned on. With speculation 
some additional tasks (for stragglers) in the upstream could have been 
scheduled and their partial spills will also be sent to the reducers.  In the 
older model, this would not be a problem as the consumers would start fetching 
only when the complete data is available from upstream task.

And at the end of the upstream processing, some of the tasks will be in KILLED 
state.  So in order to maintain sanity, we need to discard the partial data 
fetched from these KILLED tasks and the reducers would have to be restart in 
most cases.

> Support pipelined data transfer for ordered output
> --------------------------------------------------
>
>                 Key: TEZ-2001
>                 URL: https://issues.apache.org/jira/browse/TEZ-2001
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2001.1.patch, TEZ-2001.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to