[ 
https://issues.apache.org/jira/browse/TEZ-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15329015#comment-15329015
 ] 

Siddharth Seth commented on TEZ-3216:
-------------------------------------

Given this jira is primarily related to adding detailed stats, I think it'll be 
better to move the fix for "tez.runtime.report.partition.stats" to a separate 
jira. That can be considered a bug fix, and detailed stats is a new feature. 
This is preferable, but only if it's simple to do. If both patches end up 
changing a lot - lets just commit it here.
Subsequent comments (which apply to the single patch / split patches)
Comments / Questions on the behaviour
- How do we want report_partition_stats and report_detailed_partition_stats to 
interact. Option 1. If we report only one of the two - this change becomes 
incompatible, since existing VMs are not setup to use detailed_stats. A follow 
up jira should exist to change the ShuffleVertexManager to work with these 
stats. Option 2: Send both stats. This seems like a bit of a waste. I'm in 
favor of sending only one of the two.
- Will report_detailed_partitioned_stats work only if report_stats is enabled ? 
(If so, UnorderedOutput / KVWriter should thrown an error on misconfiguration). 
i.e. report_detailed can only be enabled if report_stats is enabled. 
Alternately, the two can be set independently - which would behave like Option 
2 in the previous comment.
- Is an integer required to report stats in MB ? Would 16 bits be sufficient 
instead. This makes the implementation a little more complicated since protobuf 
doesn't seem to have anything less than int32. Primarily to reduce network 
traffic. Expecting these events not to be stored in the AM (and aggregated as 
soon as they are seen).

Comments on the patch
- This is a simple, but annoying change; the config keys - 
TEZ_RUNTIME_REPORT_DETAILED_PARTITION_STATS and 
TEZ_RUNTIME_REPORT_PARTITION_STATS need to be added to the 'confKeys' map on 
all the Outputs for them to actually work in a job. OrderedPartitionedKVOutput, 
UnorderedKVOutput, UnorderedPartitionKVOutput. Otherwise the parameter ends up 
being ignored when used. (This was mainly to limit the size of the configs)
- static long ceil(long a, long b) { <- This seems rather confusing. Can we use 
a "long << 20" to get the size in MB ? That's what is used in the sorters. 
Alternately divide by 1024*1024 (ceil does this anyway - not sure why the 
additional implementation is required)
- Nit: Make 1024*1024 a constant if it stays.

Given that this is a potentially incompatible change, once the 
ShuffleVertexManager jira is created - a note on the new property in 
TezRuntimeConfiguration indicating how the ShuffleManager breaks will be useful.






> Support for more precise partition stats in VertexManagerEvent
> --------------------------------------------------------------
>
>                 Key: TEZ-3216
>                 URL: https://issues.apache.org/jira/browse/TEZ-3216
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: TEZ-3216.patch
>
>
> Follow up on TEZ-3206 discussion, at least for some use cases, more accurate 
> partition stats will be useful for DataMovementEvent routing. Maybe we can 
> provide a config option to allow apps to choose the more accurate partition 
> stats over RoaringBitmap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to