[
https://issues.apache.org/jira/browse/MAPREDUCE-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eli Collins moved HADOOP-5340 to MAPREDUCE-2812:
------------------------------------------------
Affects Version/s: (was: 0.19.1)
Key: MAPREDUCE-2812 (was: HADOOP-5340)
Project: Hadoop Map/Reduce (was: Hadoop Common)
> Combiner that aggregates all the mappers from a machine
> -------------------------------------------------------
>
> Key: MAPREDUCE-2812
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2812
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Reporter: Nathan Marz
>
> From what I can tell, the Combiner just aggregates data from a single map
> task. It would be useful, especially during map-only jobs, to have a combiner
> that aggregates data from all the map tasks on a given machine. My use case
> for this is to vertically partition a set of records which start out in the
> same files. By doing this in a map-only task, way too many files are created
> (About 50 files are created per input split). By pumping all the data through
> a reducer, a lot of unnecessary overhead occurs. With the proposed feature, I
> would get 50*number of machines files rather than 50*number of input splits
> files for this use case.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira