[jira] [Commented] (TEZ-2319) DAG history in HDFS

Rohini Palaniswamy (JIRA) Tue, 14 Apr 2015 11:27:42 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494536#comment-14494536
 ]


Rohini Palaniswamy commented on TEZ-2319:
-----------------------------------------

This is for offline analysis where all job information is parsed and loaded 
into multiple hive tables and queries are then run on those tables to analyze 
cluster usage. We keep 1 year worth of data in those hive tables. Using ATS for 
that is out of question. Also extracting data periodically (every 1 hr or 4 
hrs) and dumping from ATS is also out of question as it hardly scales as it is 
and that will bring it down. This is kind of a tee on the side and written only 
when the job completes to HDFS similar to MR. If there are better alternatives 
to get the information, we are open.

> DAG history in HDFS
> -------------------
>
>                 Key: TEZ-2319
>                 URL: https://issues.apache.org/jira/browse/TEZ-2319
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Rohini Palaniswamy
>
>   We have processes, that parse jobconf.xml and job history details (map and 
> reduce task details, etc) in avro files from HDFS and load them into hive 
> tables for analysis for mapreduce jobs. Would like to have Tez also make this 
> information written to a history file in HDFS when AM or each DAG completes 
> so that we can do analytics on Tez jobs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2319) DAG history in HDFS

Reply via email to