[
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238664#comment-15238664
]
Jonathan Eagles commented on TEZ-3206:
--------------------------------------
I have to look more closely at how the partition stats is routed, but rpc
message size in DataMovementEvents that is routed to downstream tasks is highly
sensitive. Some typical jobs I have seen may send 100,000 DMEs or more to each
reducer. 80KB per message will OOM the task.
> Have unordered partitioned KV output send partition stats via
> VertexManagerEvent
> ---------------------------------------------------------------------------------
>
> Key: TEZ-3206
> URL: https://issues.apache.org/jira/browse/TEZ-3206
> Project: Apache Tez
> Issue Type: New Feature
> Reporter: Ming Ma
>
> As part of the auto-parallelism feature, ordered partitioned KV output's
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But
> this isn't available for unordered partitioned output. Having
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the
> auto-parallelism support for unordered KV or other custom data routing
> mechanisms that depend on partition size.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)