[
https://issues.apache.org/jira/browse/MAPREDUCE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Bean resolved MAPREDUCE-5323.
----------------------------------
Resolution: Not A Problem
Misunderstood config mapreduce.map.combine.minspills as the number of spills to
require before the first combine. Instead, it's the number of spills required
for a second and subsequent combines on merge.
> Min Spills For Combine Ignored
> ------------------------------
>
> Key: MAPREDUCE-5323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5323
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: task
> Reporter: Jeff Bean
> Priority: Minor
>
> We've observed for some time that combiners always run when specified.
> However there is a config called mapreduce.map.combine.minspills which sort
> of implies that the developer or administrator ought to be able to control
> when combiners are invoked.
> I spelunked into the code and found this gem in MapTask.java:
> if (combinerRunner == null || numSpills < minSpillsForCombine) {
> Merger.writeFile(kvIter, writer, reporter, job); } else {
> combineCollector.setWriter(writer); combinerRunner.combine(kvIter,
> combineCollector); }
> That looks way buggy to me. If ( A || B ) is made false by A then B is never
> executed. I spelunked around the code some more and it looks like
> combinerRunner is never null except on reflection failure. So it looks like
> the intention is for minSpillsForCombine to be respected, but due to this
> logic error it's totally ignored.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira