Jeff Bean created MAPREDUCE-5323:
------------------------------------
Summary: Min Spills For Combine Ignored
Key: MAPREDUCE-5323
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5323
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: task
Reporter: Jeff Bean
Priority: Minor
We've observed for some time that combiners always run when specified. However
there is a config called mapreduce.map.combine.minspills which sort of implies
that the developer or administrator ought to be able to control when combiners
are invoked.
I spelunked into the code and found this gem in MapTask.java:
if (combinerRunner == null || numSpills < minSpillsForCombine) {
Merger.writeFile(kvIter, writer, reporter, job); } else {
combineCollector.setWriter(writer); combinerRunner.combine(kvIter,
combineCollector); }
That looks way buggy to me. If ( A || B ) is made false by A then B is never
executed. I spelunked around the code some more and it looks like
combinerRunner is never null except on reflection failure. So it looks like the
intention is for minSpillsForCombine to be respected, but due to this logic
error it's totally ignored.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira