[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246517#comment-15246517
 ] 

Siddharth Seth commented on TEZ-3206:
-------------------------------------

As has been pointed out here, the impact of 4bytes per message is a lot higher 
on the AM. All Sources * 4 bytes * #numPartitions is what the AM will end up 
requiring since it stores all the events.

bq. RoaringBitmap isn't accurate, but it seems good enough for the 
auto-parallelism. But it doesn't work well for data routing that depends on 
more accurate partition stats.
[~mingma] - Is RoadingBitmaps itself inaccurate, or the way we attempt to make 
use of fewer bits which is inherently lossy ?

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> ---------------------------------------------------------------------------------
>
>                 Key: TEZ-3206
>                 URL: https://issues.apache.org/jira/browse/TEZ-3206
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Ming Ma
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to