[
https://issues.apache.org/jira/browse/TEZ-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246562#comment-15246562
]
Jonathan Eagles commented on TEZ-3217:
--------------------------------------
[~jlowe] and I are experimenting with a design for enhanced DMEs which could be
used to reduce the number of messages sent to the downstream task. [~sseth], is
there an existing effort along these lines? Our use case is to compensate for
the increased number of messages needed to be processed by an auto-reduce
parallelism "reducer"
> Optimize DataMovementEvent routing between AM and downstream vertex
> -------------------------------------------------------------------
>
> Key: TEZ-3217
> URL: https://issues.apache.org/jira/browse/TEZ-3217
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Ming Ma
>
> Follow up on TEZ-3206 discussion, we might be able to optimize the DME
> routing from AM to downstream vertex, mostly for the auto-parallelism case.
> * DME's empty partition payload has all empty partitions from a specific
> mapper. At the reducer side, it only cares about the partitions it is
> responsible for, not partitions belong to other reducers. Perhaps we can
> optimize AM to send only the relevant empty partitions to that reducer.
> * Instead of sending one DME to a given reducer at a time, it can batch all
> DMEs belonging to a given (mapper, reducer) pair with the common empty
> partition payload, similar to CompositeDataMovementEvent.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)