[ 
https://issues.apache.org/jira/browse/TEZ-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703949#comment-14703949
 ] 

Bikas Saha commented on TEZ-2690:
---------------------------------

bq. Since the analyzer is not going to download the data, it might be good to 
comment related to "DagId that needs to be downloaded".
Sorry I did not understand this?

bq. Is the main() function needed in base class? Or is it given mainly as an 
example?
main() is needed for the ToolRunner to work. However it needs to be in each of 
the actual Analyzers and not the base class. Moved to CriticalPathAnalyzer.

bq. Since base already extends Configured, Analyzer.getConfiguration() should 
be removed. But this would be separate JIRA to let all analyzers extend 
TezAnalyzerBase.
Right.

bq. Changes in VertexInfo is unintentional?
Actually the patch missed some commits. Refreshed. There are changes in it.

bq. SVGUtils - It might break the earlier drawVertex(DagInfo)
I removed the code because the svg code from plutext was not enough and thus 
the pom dependency was not useful. The new code to add lines/rectangles etc. 
can be used by the other analyzers.

bq. getLastDataEventTime, getCreationTime etc got added as a part of TEZ-2701. 
So if we try to parse with older logs (e.g 0.8/0.7/0.6 etc), it might return 0 
for currentAttempt.getLastDataEventTime().
Yes. The new analyzer can only be used with appropriate release. We should 
consider back-porting these changes to earlier releases. They are simple 
enough. Without these events the analyzer cannot run. Unfortunately, since our 
ats events have not versioning, we cannot detect that we are using an older 
file. Looks like empty/unknown fields end up showing as "" empty strings. So 
its hard to tell if the field was not present or if its unknown.

bq. Otherwise it might lead to a perception that only that attempt is slow in 
that vertex.
A task on the critical path does not mean its slow. If you see the attached 
jpg. the tooltip has information about how its actual execution time compares 
to the vertex avg. So that should hint on whether its an outlier or not. In 
general, a task can be perfectly fine but still be on the critical path because 
of data dependency.

bq. Vertex's 000973 task took 515 seconds, but it's attempt took just 75 
seconds. So this would be accounted as allocation overhead. In such cases, 
should the reasoning be cluster capacity
Right. It basically means that job runtime can be improved by getting resources 
to run these tasks. This could be due to 1) not enough capacity in which case 
we can only increase capacity as a solution. We can add more information about 
container in ATS to give details about this. Or 2) it could be that we are not 
scheduling the right task at the right time. Eg. in one svg I saw the consumer 
tasks (Map vertex) allocated before the producer task (Reduce vertex) and 
causing the producer task to have allocation overhead. This is because Map 
vertex manager starts its tasks immediately even though it may have a map join 
input from the Reducer vertex. 






> Add critical path analyser
> --------------------------
>
>                 Key: TEZ-2690
>                 URL: https://issues.apache.org/jira/browse/TEZ-2690
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: TEZ-2690.1.patch, criticalPath.jpg, 
> dag_1439860407967_0030_1.svg
>
>
> Use input and scheduling dependencies to create critical path for a DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to