Guaranteeing that combiner is called at least once
--------------------------------------------------
Key: HADOOP-3594
URL: https://issues.apache.org/jira/browse/HADOOP-3594
Project: Hadoop Core
Issue Type: Bug
Reporter: Olga Natkovich
Fix For: 0.18.0
In 18, hadoop decides how many times to call combiner on both map and reduce
sides. The possible number is between 0 and N.
While having multiple invocations can be useful, not invoking combiner at all
can have serious consequences for a range of functions called algebraic
(http://classweb.gmu.edu/kersch/inft864/Readings/Shoshani/DataCube/DataCubeTechReport.pdf).
The main properties of such functions is that the intermediate and final
computations are different and that the first invokation transforms the data to
a different form. A most common example of this is AVERAGE function. While it
is possible to workaround this issue by annotating each tuple, it seems that it
would be much easier and faster if hadoop always guaranteed at least a single
invocation.
Not having this guarantee will break all sorts of existing combiners.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.