[
https://issues.apache.org/jira/browse/TEZ-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14490817#comment-14490817
]
Bikas Saha commented on TEZ-145:
--------------------------------
What I am describing is the concept of partial aggregation (see figure 7 in
http://research.microsoft.com/pubs/63785/eurosys07.pdf) in which applying a
combiner becomes a special case that may result in further data reduction
depending on the combine function. In the degenerate case the combine function
is the concatenation function which simply creates a smaller number of large
sized chunks from a large number of small sized chunks within cheaper network
domains.
> Support a combiner processor that can run non-local to map/reduce nodes
> -----------------------------------------------------------------------
>
> Key: TEZ-145
> URL: https://issues.apache.org/jira/browse/TEZ-145
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Hitesh Shah
> Assignee: Tsuyoshi Ozawa
> Attachments: TEZ-145.2.patch, WIP-TEZ-145-001.patch
>
>
> For aggregate operators that can benefit by running in multi-level trees,
> support of being able to run a combiner in a non-local mode would allow
> performance efficiencies to be gained by running a combiner at a rack-level.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)