[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated TEZ-3206:
-------------------------
    Attachment: TEZ-3206.patch

Thanks [~sseth]. Yes, it is due to the fact that Tez makes use of fewer bits 
for encode the size.

Here is the draft patch to have {{UnorderedPartitionedKVWriter}} send partition 
stats in terms of compressed output size via VertexManagerEvent. Given 
compressed size is only available after spill and could be called on the spill 
finish callback threads, make the global stat thread safe.

Note that in the current protocol for sorted partitioned case 
{{VertexManagerEventPayloadProto.Builder}}'s  {{setOutputSize}} takes 
uncompressed size, but {{setPartitionStats}} takes compressed size. Based on 
how {{ShuffleVertexManager}} consumes partition stats, it doesn't matter if it 
is compressed or not.

Maybe we should use uncompressed size for partition stats? If so, the patch 
will be simpler. And I can file a separate jira to have sorted partitioned 
switch to send uncompressed size.

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> ---------------------------------------------------------------------------------
>
>                 Key: TEZ-3206
>                 URL: https://issues.apache.org/jira/browse/TEZ-3206
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Ming Ma
>         Attachments: TEZ-3206.patch
>
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to