[ 
https://issues.apache.org/jira/browse/TEZ-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486728#comment-14486728
 ] 

Gopal V commented on TEZ-145:
-----------------------------

[~ozawa]: Got some time to test this out.

The current implementation is just adding a new reducer stage, while it is not 
entirely the same a combiner stage.

The implementation we want to produce is not exactly a sort-merged reducer 
stage, since that only fixes the parallelism part of the reducer shuffle (that 
could be achieved by increasing reducer count anyway).

The current implementation is a translation of an old MR task into an MRR task, 
but the edge connectivity is still shuffle + total-order merged for both edges 
(I see OrderedPartitionedKVEdgeConfig for both M -> R -> R edges).

I will write a more detailed design document tomorrow and upload it here which 
will expand on Bikas's earlier comment and I will draw out the runtime 
expansion graphs to indicate the sort-preserving combiner instead of re-sorting 
data along the way (since the combiner never mutates the keys or output 
ordering).

> Support a combiner processor that can run non-local to map/reduce nodes
> -----------------------------------------------------------------------
>
>                 Key: TEZ-145
>                 URL: https://issues.apache.org/jira/browse/TEZ-145
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Assignee: Tsuyoshi Ozawa
>         Attachments: TEZ-145.2.patch, WIP-TEZ-145-001.patch
>
>
> For aggregate operators that can benefit by running in multi-level trees, 
> support of being able to run a combiner in a non-local mode would allow 
> performance efficiencies to be gained by running a combiner at a rack-level. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to