[
https://issues.apache.org/jira/browse/TEZ-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703949#comment-14703949
]
Bikas Saha commented on TEZ-2690:
---------------------------------
bq. Since the analyzer is not going to download the data, it might be good to
comment related to "DagId that needs to be downloaded".
Sorry I did not understand this?
bq. Is the main() function needed in base class? Or is it given mainly as an
example?
main() is needed for the ToolRunner to work. However it needs to be in each of
the actual Analyzers and not the base class. Moved to CriticalPathAnalyzer.
bq. Since base already extends Configured, Analyzer.getConfiguration() should
be removed. But this would be separate JIRA to let all analyzers extend
TezAnalyzerBase.
Right.
bq. Changes in VertexInfo is unintentional?
Actually the patch missed some commits. Refreshed. There are changes in it.
bq. SVGUtils - It might break the earlier drawVertex(DagInfo)
I removed the code because the svg code from plutext was not enough and thus
the pom dependency was not useful. The new code to add lines/rectangles etc.
can be used by the other analyzers.
bq. getLastDataEventTime, getCreationTime etc got added as a part of TEZ-2701.
So if we try to parse with older logs (e.g 0.8/0.7/0.6 etc), it might return 0
for currentAttempt.getLastDataEventTime().
Yes. The new analyzer can only be used with appropriate release. We should
consider back-porting these changes to earlier releases. They are simple
enough. Without these events the analyzer cannot run. Unfortunately, since our
ats events have not versioning, we cannot detect that we are using an older
file. Looks like empty/unknown fields end up showing as "" empty strings. So
its hard to tell if the field was not present or if its unknown.
bq. Otherwise it might lead to a perception that only that attempt is slow in
that vertex.
A task on the critical path does not mean its slow. If you see the attached
jpg. the tooltip has information about how its actual execution time compares
to the vertex avg. So that should hint on whether its an outlier or not. In
general, a task can be perfectly fine but still be on the critical path because
of data dependency.
bq. Vertex's 000973 task took 515 seconds, but it's attempt took just 75
seconds. So this would be accounted as allocation overhead. In such cases,
should the reasoning be cluster capacity
Right. It basically means that job runtime can be improved by getting resources
to run these tasks. This could be due to 1) not enough capacity in which case
we can only increase capacity as a solution. We can add more information about
container in ATS to give details about this. Or 2) it could be that we are not
scheduling the right task at the right time. Eg. in one svg I saw the consumer
tasks (Map vertex) allocated before the producer task (Reduce vertex) and
causing the producer task to have allocation overhead. This is because Map
vertex manager starts its tasks immediately even though it may have a map join
input from the Reducer vertex.
> Add critical path analyser
> --------------------------
>
> Key: TEZ-2690
> URL: https://issues.apache.org/jira/browse/TEZ-2690
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Bikas Saha
> Assignee: Bikas Saha
> Attachments: TEZ-2690.1.patch, criticalPath.jpg,
> dag_1439860407967_0030_1.svg
>
>
> Use input and scheduling dependencies to create critical path for a DAG.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)