[ https://issues.apache.org/jira/browse/TEZ-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496715#comment-14496715 ]
Rohini Palaniswamy commented on TEZ-2319: ----------------------------------------- bq. Maybe this should be a primary ask for ATS v2 This is something that we do not want to wait for ATS v2. But it would be good if they captured this as part of the design. bq. make the SimpleHistoryLogger ( to HDFS ) production-ready and tez should allow publishing to multiple loggers. This history only needs to capture the final state of the DAG, its tasks and counters. It does not need to capture intermediate data. I am not sure SimpleHistoryLogger in its current form is a good fit. The job history in MR is in avro format and gives the whole state of the job on its completion. If AM has that in memory, then we can have a config to dump that into HDFS in some format (json/avro) which is the easiest thing. Else will need another Logger to - build the state over time (not preferrable as it will consume lot of memory) and dump on completion. - or write events as it happens, then parse it and construct only relevant information and write another file. Both options with another Logger are not efficient and I don't like the idea myself. [~jlowe]/[~jeagles] , Any better suggestions on how this can be done based on your experience with how it is currently done in MR? > DAG history in HDFS > ------------------- > > Key: TEZ-2319 > URL: https://issues.apache.org/jira/browse/TEZ-2319 > Project: Apache Tez > Issue Type: New Feature > Reporter: Rohini Palaniswamy > > We have processes, that parse jobconf.xml and job history details (map and > reduce task details, etc) in avro files from HDFS and load them into hive > tables for analysis for mapreduce jobs. Would like to have Tez also make this > information written to a history file in HDFS when AM or each DAG completes > so that we can do analytics on Tez jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)