[ 
https://issues.apache.org/jira/browse/TEZ-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240409#comment-14240409
 ] 

Siddharth Seth commented on TEZ-1610:
-------------------------------------

[~rajesh.balamohan] - my initial comment was primarily to rename the counters 
added in the first patch, and document their meaning. (Users shouldn't end  up 
getting confused with SFUFFLE_TIME_TAKEN, MERGE_TIME_TAKEN to imply only 
Shuffle or full merge). These are more like checkpoints in the task - which I 
think is reasonable. Adding simple counters like FIRST_EVENT_RECEIVED and 
LAST_EVENT_RECEIVED will be useful primarily from figuring out latency in event 
arrival / pre-launch overheads / stragglers at the source.
These would need to be relative timestamps from task start (which is the case 
in the first patch), so that they have at least some meaning at the Vertex / 
DAG level.

bq. SHUFFLE_TIME_AS_PERCENTAGE
Should this be part of a separate jira. Maybe something along the lines of 95% 
complete by X, last 5% took a long time (or only 1 fetcher active at the end). 
Trying to figure out what we can interpret for percentage or a 95% completion. 
It may not be possible to represent some of this as counters.
Similarly for SHUFFLE_LAST_ATTEMPT_ARRIVAL_PERCENTAGE - should it be last event 
or last burst of events ?

Given the interleaving of the fetch, merge and the merge being inline with the 
final processing - I'm not sure what a good way to measure some of this info is 
- for generic jobs.




> additional task counters for fetchers
> -------------------------------------
>
>                 Key: TEZ-1610
>                 URL: https://issues.apache.org/jira/browse/TEZ-1610
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-1610.1.patch, TEZ-1610.2.patch
>
>
> - ShuffleFinishTime (per source)
> - Merge time (depending on broadcast/scatter-gather shuffle)
> This would be helpful in determining when shuffle started/ended for different 
> sources in a task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to