[
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246517#comment-15246517
]
Siddharth Seth commented on TEZ-3206:
-------------------------------------
As has been pointed out here, the impact of 4bytes per message is a lot higher
on the AM. All Sources * 4 bytes * #numPartitions is what the AM will end up
requiring since it stores all the events.
bq. RoaringBitmap isn't accurate, but it seems good enough for the
auto-parallelism. But it doesn't work well for data routing that depends on
more accurate partition stats.
[~mingma] - Is RoadingBitmaps itself inaccurate, or the way we attempt to make
use of fewer bits which is inherently lossy ?
> Have unordered partitioned KV output send partition stats via
> VertexManagerEvent
> ---------------------------------------------------------------------------------
>
> Key: TEZ-3206
> URL: https://issues.apache.org/jira/browse/TEZ-3206
> Project: Apache Tez
> Issue Type: New Feature
> Reporter: Ming Ma
>
> As part of the auto-parallelism feature, ordered partitioned KV output's
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But
> this isn't available for unordered partitioned output. Having
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the
> auto-parallelism support for unordered KV or other custom data routing
> mechanisms that depend on partition size.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)