[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238571#comment-15238571
 ] 

Ming Ma commented on TEZ-3206:
------------------------------

Current implementation in ordered partitioned KV output uses RoaringBitmap for 
rough estimate of the partition size. Is this optimization necessary? a) 
VertexManagerEvent is sent when it spills or output is closed so the frequency 
is relatively low. b) Assume we don't use bitmap and instead use 4 bytes for 
each partition size and there are 20k reducers, that is 80KB in size, not large 
for RPC.

RoaringBitmap isn't accurate, but it seems good enough for the 
auto-parallelism. But it doesn't work well for data routing that depends on 
more accurate partition stats.

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> ---------------------------------------------------------------------------------
>
>                 Key: TEZ-3206
>                 URL: https://issues.apache.org/jira/browse/TEZ-3206
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Ming Ma
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to