[jira] [Commented] (TEZ-2319) DAG history in HDFS

Rohini Palaniswamy (JIRA) Wed, 15 Apr 2015 12:08:29 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496715#comment-14496715
 ]


Rohini Palaniswamy commented on TEZ-2319:
-----------------------------------------

bq. Maybe this should be a primary ask for ATS v2
   This is something that we do not want to wait for ATS v2. But it would be 
good if they captured this as part of the design.

bq. make the SimpleHistoryLogger ( to HDFS ) production-ready and tez should 
allow publishing to multiple loggers.
This history only needs to capture the final state of the DAG, its tasks and 
counters. It does not need to capture intermediate data. I am not sure 
SimpleHistoryLogger in its current form is a good fit. The job history in MR is 
in avro format and gives the whole state of the job on its completion. If AM 
has that in memory, then we can have a config to dump that into HDFS in some 
format (json/avro) which is the easiest thing. Else will need another Logger to 
    - build the state over time (not preferrable as it will consume lot of 
memory) and dump on completion.
    - or write events as it happens, then parse it and construct only relevant 
information and write another file. 
Both options with another Logger are not efficient and I don't like the idea 
myself.

  [~jlowe]/[~jeagles] , Any better suggestions on how this can be done based on 
your experience with how it is currently done in MR?

> DAG history in HDFS
> -------------------
>
>                 Key: TEZ-2319
>                 URL: https://issues.apache.org/jira/browse/TEZ-2319
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Rohini Palaniswamy
>
>   We have processes, that parse jobconf.xml and job history details (map and 
> reduce task details, etc) in avro files from HDFS and load them into hive 
> tables for analysis for mapreduce jobs. Would like to have Tez also make this 
> information written to a history file in HDFS when AM or each DAG completes 
> so that we can do analytics on Tez jobs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2319) DAG history in HDFS

Reply via email to