[
https://issues.apache.org/jira/browse/TEZ-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702605#comment-14702605
]
Rajesh Balamohan edited comment on TEZ-2690 at 8/19/15 10:04 AM:
-----------------------------------------------------------------
More comments on analyzer
- SVG prints the vertexname and the task attempt info. But for example if a
vertex has got 10 tasks which had 9 tasks have small time lag of 1 second
compared to the longest task in that vertex. As of now, analyzer would end up
printing the longest task in the vertex. Should more info be provided in the
tooltip to give details on how many such tasks are there which are comparable
to the longest running task in that vertex? (or any other means of representing
it?. Thoughts?). Otherwise it might lead to a perception that only that attempt
is slow in that vertex.
- Assume a vertex has got 1009 tasks and due to cluster capacity it went into
multiple waves. SVGUtil's following code would print the longest task with most
of the time benig spent on "Task Allocation overhead"
{noformat}
addRectStr(creationTimeInterval, allocationTimeInterval - creationTimeInterval,
yOffset * STEP_GAP, STEP_GAP, ALLOCATION_OVERHEAD_COLOR,
BORDER_COLOR, RECT_OPACITY,
titleStr);
{noformat}
e.g, Vertex's 000973 task took 515 seconds, but it's attempt took just 75
seconds. So this would be accounted as allocation overhead. In such cases,
should the reasoning be cluster capacity (i.e vertex runtime could have been
improved with better capacity?). Plz refer to
https://issues.apache.org/jira/secure/attachment/12751227/dag_1439860407967_0030_1.svg
for context on this.
was (Author: rajesh.balamohan):
More comments on analyzer
- SVG prints the vertexname and the task attempt info. But for example if a
vertex has got 10 tasks which had 9 tasks have small time lag of 1 second
compared to the longest task in that vertex. As of now, analyzer would end up
printing the longest task in the vertex. Should more info be provided in the
tooltip to give details on how many such tasks are there which are comparable
to the longest running task in that vertex? (or any other means of representing
it?. Thoughts?). Otherwise it might lead to a perception that only that attempt
is slow in that vertex.
- Assume a vertex has got 1009 tasks and due to cluster capacity it went into
multiple waves. SVGUtil's following code would print the longest task with most
of the time benig spent on "Task Allocation overhead"
{noformat}
addRectStr(creationTimeInterval, allocationTimeInterval - creationTimeInterval,
yOffset * STEP_GAP, STEP_GAP, ALLOCATION_OVERHEAD_COLOR,
BORDER_COLOR, RECT_OPACITY,
titleStr);
{noformat}
e.g, Vertex's 000973 task took 515 seconds, but it's attempt took just 75
seconds. So this would be accounted as allocation overhead. In such cases,
should we interpretion/reasoning be cluster capacity (i.e vertex runtime could
have been improved with better capacity?). Plz refer to
https://issues.apache.org/jira/secure/attachment/12751227/dag_1439860407967_0030_1.svg
for context on this.
> Add critical path analyser
> --------------------------
>
> Key: TEZ-2690
> URL: https://issues.apache.org/jira/browse/TEZ-2690
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Bikas Saha
> Assignee: Bikas Saha
> Attachments: TEZ-2690.1.patch, criticalPath.jpg,
> dag_1439860407967_0030_1.svg
>
>
> Use input and scheduling dependencies to create critical path for a DAG.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)