[ 
https://issues.apache.org/jira/browse/TEZ-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702605#comment-14702605
 ] 

Rajesh Balamohan edited comment on TEZ-2690 at 8/19/15 10:04 AM:
-----------------------------------------------------------------

More comments on analyzer
- SVG prints the vertexname and the task attempt info. But for example if a 
vertex has got 10 tasks which had 9 tasks have small time lag of 1 second 
compared to the longest task in that vertex. As of now, analyzer would end up 
printing the longest task in the vertex. Should more info be provided in the 
tooltip to give details on how many such tasks are there which are comparable 
to the longest running task in that vertex? (or any other means of representing 
it?. Thoughts?). Otherwise it might lead to a perception that only that attempt 
is slow in that vertex.

- Assume a vertex has got 1009 tasks and due to cluster capacity it went into 
multiple waves. SVGUtil's following code would print the longest task with most 
of the time benig spent on "Task Allocation overhead"
{noformat}
 addRectStr(creationTimeInterval, allocationTimeInterval - creationTimeInterval,
          yOffset * STEP_GAP, STEP_GAP, ALLOCATION_OVERHEAD_COLOR, 
BORDER_COLOR, RECT_OPACITY,
          titleStr);
{noformat}

e.g, Vertex's 000973 task took 515 seconds, but it's attempt took just 75 
seconds. So this would be accounted as allocation overhead. In such cases, 
should the reasoning be cluster capacity (i.e vertex runtime could have been 
improved with better capacity?). Plz refer to 
https://issues.apache.org/jira/secure/attachment/12751227/dag_1439860407967_0030_1.svg
 for context on this.


was (Author: rajesh.balamohan):
More comments on analyzer
- SVG prints the vertexname and the task attempt info. But for example if a 
vertex has got 10 tasks which had 9 tasks have small time lag of 1 second 
compared to the longest task in that vertex. As of now, analyzer would end up 
printing the longest task in the vertex. Should more info be provided in the 
tooltip to give details on how many such tasks are there which are comparable 
to the longest running task in that vertex? (or any other means of representing 
it?. Thoughts?). Otherwise it might lead to a perception that only that attempt 
is slow in that vertex.

- Assume a vertex has got 1009 tasks and due to cluster capacity it went into 
multiple waves. SVGUtil's following code would print the longest task with most 
of the time benig spent on "Task Allocation overhead"
{noformat}
 addRectStr(creationTimeInterval, allocationTimeInterval - creationTimeInterval,
          yOffset * STEP_GAP, STEP_GAP, ALLOCATION_OVERHEAD_COLOR, 
BORDER_COLOR, RECT_OPACITY,
          titleStr);
{noformat}

e.g, Vertex's 000973 task took 515 seconds, but it's attempt took just 75 
seconds. So this would be accounted as allocation overhead. In such cases, 
should we interpretion/reasoning be cluster capacity (i.e vertex runtime could 
have been improved with better capacity?). Plz refer to 
https://issues.apache.org/jira/secure/attachment/12751227/dag_1439860407967_0030_1.svg
 for context on this.

> Add critical path analyser
> --------------------------
>
>                 Key: TEZ-2690
>                 URL: https://issues.apache.org/jira/browse/TEZ-2690
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: TEZ-2690.1.patch, criticalPath.jpg, 
> dag_1439860407967_0030_1.svg
>
>
> Use input and scheduling dependencies to create critical path for a DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to