[
https://issues.apache.org/jira/browse/HADOOP-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Owen O'Malley resolved HADOOP-3594.
-----------------------------------
Resolution: Won't Fix
Fix Version/s: (was: 0.19.0)
Olga's concern was that HADOOP-3226 changed the semantics of combiners in an
incompatible way. I've updated the release note on HADOOP-3226 to both call out
the semantic change and point out the deprecated configuration attribute that
disables the additional calls the combiner.
> Guaranteeing that combiner is called at least once
> --------------------------------------------------
>
> Key: HADOOP-3594
> URL: https://issues.apache.org/jira/browse/HADOOP-3594
> Project: Hadoop Core
> Issue Type: Bug
> Reporter: Olga Natkovich
>
> In 18, hadoop decides how many times to call combiner on both map and reduce
> sides. The possible number is between 0 and N.
> While having multiple invocations can be useful, not invoking combiner at all
> can have serious consequences for a range of functions called algebraic
> (http://classweb.gmu.edu/kersch/inft864/Readings/Shoshani/DataCube/DataCubeTechReport.pdf).
> The main properties of such functions is that the intermediate and final
> computations are different and that the first invokation transforms the data
> to a different form. A most common example of this is AVERAGE function. While
> it is possible to workaround this issue by annotating each tuple, it seems
> that it would be much easier and faster if hadoop always guaranteed at least
> a single invocation.
>
> Not having this guarantee will break all sorts of existing combiners.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.