[
https://issues.apache.org/jira/browse/TEZ-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328628#comment-14328628
]
Gopal V edited comment on TEZ-2001 at 2/20/15 6:56 AM:
-------------------------------------------------------
bq. Isnt straggler mitigation one of the primary motivations for this?
No, we aren't aiming this at stragglers - the issue is skewed data itself.
Running a skewed task again on a different node is probably not going to make
it any faster to complete.
Speculation is unlikely to be successful in scenarios with significant skew.
On a reliable mid-size cluster, turning that off might be a better win for
throughput, particularly when dealing with the middle reducer of an MRR DAG
(i.e two reducers pulling data off the same shuffle handlers & eating up
bandwidth).
was (Author: gopalv):
bq. Isnt straggler mitigation one of the primary motivations for this?
No, we aren't aiming this at stragglers - the issue is skewed data itself.
Running a skewed task again on a different node is probably not going to make
it any faster to complete.
Speculation is unlikely to be successful in scenarios with significant skew.
> Support pipelined data transfer for ordered output
> --------------------------------------------------
>
> Key: TEZ-2001
> URL: https://issues.apache.org/jira/browse/TEZ-2001
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-2001.1.patch, TEZ-2001.2.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)