[
https://issues.apache.org/jira/browse/TEZ-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14336999#comment-14336999
]
Gopal V commented on TEZ-2076:
------------------------------
bq. Essentially, the patch is enabling a library to download events from ATS
(SimpleHistoryFile from HDFS also?) and using that data to create a
post-execution Java model of the DAG for further custom post-processing that is
user defined?
Yes, this is an abstraction tool to replace a sequence of parsers written over
the last year, with being as close to 1:1 to the last model in this list.
One of which is checked into tez-tools today & is in python
https://github.com/apache/tez/blob/branch-0.5/tez-tools/swimlanes/amlogparser.py#L177
Another one off an insecure ATS (again, in python)
https://gist.github.com/t3rmin4t0r/d852fd8c14a2891fbe10
With another independent version written by [~mmokhtar]
https://github.com/ttmahdy/SummarizeTezLogsForHive/blob/master/src/ApplicationLogsParser.java
Then another one written by [~pramachandran] in JS
https://github.com/apache/tez/blob/master/tez-ui/src/main/webapp/app/scripts/models/
All of these were written for nearly the same purpose of being able to look at
Tez DAG execution data. We'll try to collect all the raw data consumed by the
tez-ui system and try to maintain a user API to run ad-hoc correlation analysis
tools over it.
> Tez framework to extract/analyze data stored in ATS for specific dag
> --------------------------------------------------------------------
>
> Key: TEZ-2076
> URL: https://issues.apache.org/jira/browse/TEZ-2076
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-2076.1.patch, TEZ-2076.2.patch, TEZ-2076.3.patch,
> TEZ-2076.WIP.2.patch, TEZ-2076.WIP.3.patch, TEZ-2076.WIP.patch
>
>
> - Users should be able to download ATS data pertaining to a DAG from Tez-UI
> (more like a zip file containing DAG/Vertex/Task/TaskAttempt info).
> - This can be plugged to an analyzer which parses the data, adds semantics
> and provides an in-memory representation for further analysis.
> - This will enable to write different analyzer rules, which can be run on top
> of this in-memory representation to come up with analysis on the DAG.
> - Results of this analyzer rules can be rendered on to UI (standalone webapp)
> later point in time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)