[
https://issues.apache.org/jira/browse/TEZ-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240409#comment-14240409
]
Siddharth Seth commented on TEZ-1610:
-------------------------------------
[~rajesh.balamohan] - my initial comment was primarily to rename the counters
added in the first patch, and document their meaning. (Users shouldn't end up
getting confused with SFUFFLE_TIME_TAKEN, MERGE_TIME_TAKEN to imply only
Shuffle or full merge). These are more like checkpoints in the task - which I
think is reasonable. Adding simple counters like FIRST_EVENT_RECEIVED and
LAST_EVENT_RECEIVED will be useful primarily from figuring out latency in event
arrival / pre-launch overheads / stragglers at the source.
These would need to be relative timestamps from task start (which is the case
in the first patch), so that they have at least some meaning at the Vertex /
DAG level.
bq. SHUFFLE_TIME_AS_PERCENTAGE
Should this be part of a separate jira. Maybe something along the lines of 95%
complete by X, last 5% took a long time (or only 1 fetcher active at the end).
Trying to figure out what we can interpret for percentage or a 95% completion.
It may not be possible to represent some of this as counters.
Similarly for SHUFFLE_LAST_ATTEMPT_ARRIVAL_PERCENTAGE - should it be last event
or last burst of events ?
Given the interleaving of the fetch, merge and the merge being inline with the
final processing - I'm not sure what a good way to measure some of this info is
- for generic jobs.
> additional task counters for fetchers
> -------------------------------------
>
> Key: TEZ-1610
> URL: https://issues.apache.org/jira/browse/TEZ-1610
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-1610.1.patch, TEZ-1610.2.patch
>
>
> - ShuffleFinishTime (per source)
> - Merge time (depending on broadcast/scatter-gather shuffle)
> This would be helpful in determining when shuffle started/ended for different
> sources in a task.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)