[
https://issues.apache.org/jira/browse/TEZ-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14528000#comment-14528000
]
Rajesh Balamohan commented on TEZ-2076:
---------------------------------------
>> What if there are 100,000 attempts? or more? Does this require a large
>> memory footprint?
Tried parsing a job with 1000 x 100,000 (no-op example with scatter gather).
After parsing 101,000 task info, 42,000 taskAttempts, counters, events, etc,
POJO representation of entire DAG occupied around 350 MB in memory. The rest
of the attempts are still being written to ATS for long time.
TezTaskID.fromString(), TezTaskAttemptID.fromString() takes up lots of CPU time
(if we parallelize processing this can be reduced later).
> Tez framework to extract/analyze data stored in ATS for specific dag
> --------------------------------------------------------------------
>
> Key: TEZ-2076
> URL: https://issues.apache.org/jira/browse/TEZ-2076
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-2076.1.patch, TEZ-2076.10.patch, TEZ-2076.2.patch,
> TEZ-2076.3.patch, TEZ-2076.4.patch, TEZ-2076.5.patch, TEZ-2076.6.patch,
> TEZ-2076.7.patch, TEZ-2076.8.patch, TEZ-2076.9.patch, TEZ-2076.WIP.2.patch,
> TEZ-2076.WIP.3.patch, TEZ-2076.WIP.patch
>
>
> - Users should be able to download ATS data pertaining to a DAG from Tez-UI
> (more like a zip file containing DAG/Vertex/Task/TaskAttempt info).
> - This can be plugged to an analyzer which parses the data, adds semantics
> and provides an in-memory representation for further analysis.
> - This will enable to write different analyzer rules, which can be run on top
> of this in-memory representation to come up with analysis on the DAG.
> - Results of this analyzer rules can be rendered on to UI (standalone webapp)
> later point in time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)