[
https://issues.apache.org/jira/browse/TEZ-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246488#comment-15246488
]
Siddharth Seth commented on TEZ-3217:
-------------------------------------
I believe [~jeagles] was experimenting with sending only specific empty
partitions to the consumer. That can get very expensive on the AM side. Maybe
that cost will be acceptable once we have the AM process an event exactly once,
instead of once per consumer. Similarly for multiple partitions in a
CompositeEvent rather than separate events for each of them.
> Optimize DataMovementEvent routing between AM and downstream vertex
> -------------------------------------------------------------------
>
> Key: TEZ-3217
> URL: https://issues.apache.org/jira/browse/TEZ-3217
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Ming Ma
>
> Follow up on TEZ-3206 discussion, we might be able to optimize the DME
> routing from AM to downstream vertex, mostly for the auto-parallelism case.
> * DME's empty partition payload has all empty partitions from a specific
> mapper. At the reducer side, it only cares about the partitions it is
> responsible for, not partitions belong to other reducers. Perhaps we can
> optimize AM to send only the relevant empty partitions to that reducer.
> * Instead of sending one DME to a given reducer at a time, it can batch all
> DMEs belonging to a given (mapper, reducer) pair with the common empty
> partition payload, similar to CompositeDataMovementEvent.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)