[
https://issues.apache.org/jira/browse/HADOOP-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584198#action_12584198
]
Runping Qi commented on HADOOP-3130:
------------------------------------
Speaking of failing reducer because of failing to fetch map output, we got to
do some careful analysis here.
At least, we have to differentiate between the case of failing to fetch one map
output numerous times and the case of failing to
fetch a lot of different map outputs. In the first case, it is better to
re-execute the map.
In the second case, maybe it makes sense to consider to fail the reducer.
Also, we should differentiate between the early stage of shuffling (where the
reducer may have thousands of map outputs to fetch)
and the late stage where only a few map outputs are left for fetching. In the
early stage, it does not matter to fail to connect to a
few mappers, since the reducer has plenty to do. In the late stage, failing
the reducer is much costly than re-execute the maps.
> Shuffling takes too long to get the last map output.
> ----------------------------------------------------
>
> Key: HADOOP-3130
> URL: https://issues.apache.org/jira/browse/HADOOP-3130
> Project: Hadoop Core
> Issue Type: Bug
> Reporter: Runping Qi
> Attachments: HADOOP-3130.patch, shuffling.log
>
>
> I noticed that towards the end of shufflling, the map output fetcher of the
> reducer backs off too aggressively.
> I attach a fraction of one reduce log of my job.
> Noticed that the last map output was not fetched in 2 minutes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.