[
https://issues.apache.org/jira/browse/MAPREDUCE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903682#action_12903682
]
Hong Tang commented on MAPREDUCE-2038:
--------------------------------------
bq. Yep, that's the basic idea. Implementing rack-combiners as a first class
concept would be neat, but the point above is that we can "fake" it if we have
locality for reducers, with a lot less work. I don't know if it would have a
huge performance improvement, but we could experiment with it easily given this
feature.
Makes sense to me.
> Making reduce tasks locality-aware
> ----------------------------------
>
> Key: MAPREDUCE-2038
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2038
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Reporter: Hong Tang
>
> Currently Hadoop MapReduce framework does not take into consideration of data
> locality when it decides to launch reduce tasks. There are several cases
> where it could become sub-optimal.
> - The map output data for a particular reduce task are not distributed evenly
> across different racks. This could happen when the job does not have many
> maps, or when there is heavy skew in map output data.
> - A reduce task may need to access some side file (e.g. Pig fragmented join,
> or incremental merge of unsorted smaller dataset with an already sorted large
> dataset). It'd be useful to place reduce tasks based on the location of the
> side files they need to access.
> This jira is created for the purpose of soliciting ideas on how we can make
> it better.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.