[ 
https://issues.apache.org/jira/browse/HADOOP-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HADOOP-3594.
-----------------------------------

       Resolution: Won't Fix
    Fix Version/s:     (was: 0.19.0)

Olga's concern was that HADOOP-3226 changed the semantics of combiners in an 
incompatible way. I've updated the release note on HADOOP-3226 to both call out 
the semantic change and point out the deprecated configuration attribute that 
disables the additional calls the combiner.

> Guaranteeing that combiner is called at least once
> --------------------------------------------------
>
>                 Key: HADOOP-3594
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3594
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>
> In 18, hadoop decides how many times to call combiner on both map and reduce 
> sides. The possible number is between 0 and N. 
> While having multiple invocations can be useful, not invoking combiner at all 
> can have serious consequences for a range of functions called algebraic 
> (http://classweb.gmu.edu/kersch/inft864/Readings/Shoshani/DataCube/DataCubeTechReport.pdf).
>  The main properties of such functions is that the intermediate and final 
> computations are different and that the first invokation transforms the data 
> to a different form. A most common example of this is AVERAGE function. While 
> it is possible to workaround this issue by annotating each tuple, it seems 
> that it would be much easier and faster if hadoop always guaranteed at least 
> a single invocation.
>  
> Not having this guarantee will break all sorts of existing combiners.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to