[
https://issues.apache.org/jira/browse/TEZ-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494536#comment-14494536
]
Rohini Palaniswamy commented on TEZ-2319:
-----------------------------------------
This is for offline analysis where all job information is parsed and loaded
into multiple hive tables and queries are then run on those tables to analyze
cluster usage. We keep 1 year worth of data in those hive tables. Using ATS for
that is out of question. Also extracting data periodically (every 1 hr or 4
hrs) and dumping from ATS is also out of question as it hardly scales as it is
and that will bring it down. This is kind of a tee on the side and written only
when the job completes to HDFS similar to MR. If there are better alternatives
to get the information, we are open.
> DAG history in HDFS
> -------------------
>
> Key: TEZ-2319
> URL: https://issues.apache.org/jira/browse/TEZ-2319
> Project: Apache Tez
> Issue Type: New Feature
> Reporter: Rohini Palaniswamy
>
> We have processes, that parse jobconf.xml and job history details (map and
> reduce task details, etc) in avro files from HDFS and load them into hive
> tables for analysis for mapreduce jobs. Would like to have Tez also make this
> information written to a history file in HDFS when AM or each DAG completes
> so that we can do analytics on Tez jobs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)