[ 
https://issues.apache.org/jira/browse/TEZ-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated TEZ-3216:
-------------------------
    Attachment: TEZ-3216-3.patch

Thanks [~sseth] and [~rajesh.balamohan] for the input. The updated patch should 
cover all the suggestions.

Regarding testing with large job, I use TEZ-3209 with some highly skew data we 
have.

* For 100 partitions, VertexManagerEvent size is around 200bytes; for 10k 
partitions, around 20kb. The factor of 2 bytes per partition is likely due to 
protobuf encoding. When multiple source tasks send VertexManagerEvent at the 
same time, ballpark estimate indicates it could handle around 2GB / 20k = 100k 
source tasks for 10k partitions if AM heap size is 3GB (1GB for AM other 
housekeeping, 2GB for VertexManagerEvent event queue).
* Given both TEZ-3209 and ShuffleVertexManager discard VertexManagerEvent once 
received and only store aggregated partition stats, this change doesn't  have 
impact the sustained AM memory compared with bitset approach. The actual test 
job used 10k partitions and 20k source tasks and can finish with the default AM 
heap size.

> Support for more precise partition stats in VertexManagerEvent
> --------------------------------------------------------------
>
>                 Key: TEZ-3216
>                 URL: https://issues.apache.org/jira/browse/TEZ-3216
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: TEZ-3216-2.patch, TEZ-3216-3.patch, TEZ-3216.patch
>
>
> Follow up on TEZ-3206 discussion, at least for some use cases, more accurate 
> partition stats will be useful for DataMovementEvent routing. Maybe we can 
> provide a config option to allow apps to choose the more accurate partition 
> stats over RoaringBitmap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to