[
https://issues.apache.org/jira/browse/TEZ-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486728#comment-14486728
]
Gopal V commented on TEZ-145:
-----------------------------
[~ozawa]: Got some time to test this out.
The current implementation is just adding a new reducer stage, while it is not
entirely the same a combiner stage.
The implementation we want to produce is not exactly a sort-merged reducer
stage, since that only fixes the parallelism part of the reducer shuffle (that
could be achieved by increasing reducer count anyway).
The current implementation is a translation of an old MR task into an MRR task,
but the edge connectivity is still shuffle + total-order merged for both edges
(I see OrderedPartitionedKVEdgeConfig for both M -> R -> R edges).
I will write a more detailed design document tomorrow and upload it here which
will expand on Bikas's earlier comment and I will draw out the runtime
expansion graphs to indicate the sort-preserving combiner instead of re-sorting
data along the way (since the combiner never mutates the keys or output
ordering).
> Support a combiner processor that can run non-local to map/reduce nodes
> -----------------------------------------------------------------------
>
> Key: TEZ-145
> URL: https://issues.apache.org/jira/browse/TEZ-145
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Hitesh Shah
> Assignee: Tsuyoshi Ozawa
> Attachments: TEZ-145.2.patch, WIP-TEZ-145-001.patch
>
>
> For aggregate operators that can benefit by running in multi-level trees,
> support of being able to run a combiner in a non-local mode would allow
> performance efficiencies to be gained by running a combiner at a rack-level.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)