Making reduce tasks locality-aware
----------------------------------
Key: MAPREDUCE-2038
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2038
Project: Hadoop Map/Reduce
Issue Type: New Feature
Reporter: Hong Tang
Currently Hadoop MapReduce framework does not take into consideration of data
locality when it decides to launch reduce tasks. There are several cases where
it could become sub-optimal.
- The map output data for a particular reduce task are not distributed evenly
across different racks. This could happen when the job does not have many maps,
or when there is heavy skew in map output data.
- A reduce task may need to access some side file (e.g. Pig fragmented join, or
incremental merge of unsorted smaller dataset with an already sorted large
dataset). It'd be useful to place reduce tasks based on the location of the
side files they need to access.
This jira is created for the purpose of soliciting ideas on how we can make it
better.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.