[
https://issues.apache.org/jira/browse/TEZ-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328674#comment-14328674
]
Bikas Saha commented on TEZ-145:
--------------------------------
[~ozawa] Since DAG creation is a user land operation, Tez cannot insert this
optimization by itself. This would most likely have to be a combination of a
CombinerProcessor that can be added to a Combiner vertex in the DAG. Its inputs
and outputs would have to be assigned based on the producers (sorted etc). It
would run the combine to reduce data and then output (in desired format -
sorted etc) to the next vertex. The users would choose to add this vertex when
they expect significant data reduction. There is likely to be an associated
VertexManager that can group input data to combiner tasks to combine data on a
machine or on a rack dynamically at runtime (instead of TaskLocationHint).
Different combiner tasks may have different number of input edges.
> Support a combiner processor that can run non-local to map/reduce nodes
> -----------------------------------------------------------------------
>
> Key: TEZ-145
> URL: https://issues.apache.org/jira/browse/TEZ-145
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Hitesh Shah
> Assignee: Tsuyoshi OZAWA
> Labels: gsoc, gsoc2015, hadoop, java, tez
>
> For aggregate operators that can benefit by running in multi-level trees,
> support of being able to run a combiner in a non-local mode would allow
> performance efficiencies to be gained by running a combiner at a rack-level.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)