Combiner that aggregates all the mappers from a machine
-------------------------------------------------------
Key: HADOOP-5340
URL: https://issues.apache.org/jira/browse/HADOOP-5340
Project: Hadoop Core
Issue Type: New Feature
Affects Versions: 0.19.1
Reporter: Nathan Marz
>From what I can tell, the Combiner just aggregates data from a single map
>task. It would be useful, especially during map-only jobs, to have a combiner
>that aggregates data from all the map tasks on a given machine. My use case
>for this is to vertically partition a set of records which start out in the
>same files. By doing this in a map-only task, way too many files are created
>(About 50 files are created per input split). By pumping all the data through
>a reducer, a lot of unnecessary overhead occurs. With the proposed feature, I
>would get 50*number of machines files rather than 50*number of input splits
>files for this use case.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.