[
https://issues.apache.org/jira/browse/MAPREDUCE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720426#comment-13720426
]
Tsuyoshi OZAWA commented on MAPREDUCE-5153:
-------------------------------------------
This discussion is "in-mapper combining vs disk-based combining" essentially.
If user program including scalding and cascading does in-mapper combining and
emits their values based on memory usage, the similar effect can be gotten,
although it's partially. In most case, this partial approach is enough to get
more performance. What do you think?
> Support for running combiners without reducers
> ----------------------------------------------
>
> Key: MAPREDUCE-5153
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5153
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Reporter: Radim Kolar
>
> scenario: Workflow mapper -> sort -> combiner -> hdfs
> No api change is need, if user set combiner class and reducers = 0 then run
> combiner and sent output to HDFS.
> Popular libraries such as scalding and cascading are offering this
> functionality, but they use caching entire mapper output in memory.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira