[
https://issues.apache.org/jira/browse/HADOOP-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606035#action_12606035
]
Owen O'Malley commented on HADOOP-3594:
---------------------------------------
The functionality must be moved into either the map or the reduce if you want
it to be done exactly once.
> Guaranteeing that combiner is called at least once
> --------------------------------------------------
>
> Key: HADOOP-3594
> URL: https://issues.apache.org/jira/browse/HADOOP-3594
> Project: Hadoop Core
> Issue Type: Bug
> Reporter: Olga Natkovich
> Fix For: 0.18.0
>
>
> In 18, hadoop decides how many times to call combiner on both map and reduce
> sides. The possible number is between 0 and N.
> While having multiple invocations can be useful, not invoking combiner at all
> can have serious consequences for a range of functions called algebraic
> (http://classweb.gmu.edu/kersch/inft864/Readings/Shoshani/DataCube/DataCubeTechReport.pdf).
> The main properties of such functions is that the intermediate and final
> computations are different and that the first invokation transforms the data
> to a different form. A most common example of this is AVERAGE function. While
> it is possible to workaround this issue by annotating each tuple, it seems
> that it would be much easier and faster if hadoop always guaranteed at least
> a single invocation.
>
> Not having this guarantee will break all sorts of existing combiners.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.