[jira] [Commented] (TEZ-2076) Tez framework to extract/analyze data stored in ATS for specific dag

Gopal V (JIRA) Wed, 25 Feb 2015 11:09:16 -0800

    [ 
https://issues.apache.org/jira/browse/TEZ-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14336999#comment-14336999
 ]


Gopal V commented on TEZ-2076:
------------------------------

bq. Essentially, the patch is enabling a library to download events from ATS 
(SimpleHistoryFile from HDFS also?) and using that data to create a 
post-execution Java model of the DAG for further custom post-processing that is 
user defined?

Yes, this is an abstraction tool to replace a sequence of parsers written over 
the last year, with being as close to 1:1 to the last model in this list.

One of which is checked into tez-tools today & is in python

https://github.com/apache/tez/blob/branch-0.5/tez-tools/swimlanes/amlogparser.py#L177

Another one off an insecure ATS (again, in python)

https://gist.github.com/t3rmin4t0r/d852fd8c14a2891fbe10

With another independent version written by [~mmokhtar]

https://github.com/ttmahdy/SummarizeTezLogsForHive/blob/master/src/ApplicationLogsParser.java

Then another one written by [~pramachandran] in JS

https://github.com/apache/tez/blob/master/tez-ui/src/main/webapp/app/scripts/models/

All of these were written for nearly the same purpose of being able to look at 
Tez DAG execution data. We'll try to collect all the raw data consumed by the 
tez-ui system and try to maintain a user API to run ad-hoc correlation analysis 
tools over it.

> Tez framework to extract/analyze data stored in ATS for specific dag
> --------------------------------------------------------------------
>
>                 Key: TEZ-2076
>                 URL: https://issues.apache.org/jira/browse/TEZ-2076
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2076.1.patch, TEZ-2076.2.patch, TEZ-2076.3.patch, 
> TEZ-2076.WIP.2.patch, TEZ-2076.WIP.3.patch, TEZ-2076.WIP.patch
>
>
> - Users should be able to download ATS data pertaining to a DAG from Tez-UI 
> (more like a zip file containing DAG/Vertex/Task/TaskAttempt info).
> - This can be plugged to an analyzer which parses the data, adds semantics 
> and provides an in-memory representation for further analysis.
> - This will enable to write different analyzer rules, which can be run on top 
> of this in-memory representation to come up with analysis on the DAG.
> - Results of this analyzer rules can be rendered on to UI (standalone webapp) 
> later point in time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2076) Tez framework to extract/analyze data stored in ATS for specific dag

Reply via email to