[
https://issues.apache.org/jira/browse/TEZ-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496715#comment-14496715
]
Rohini Palaniswamy commented on TEZ-2319:
-----------------------------------------
bq. Maybe this should be a primary ask for ATS v2
This is something that we do not want to wait for ATS v2. But it would be
good if they captured this as part of the design.
bq. make the SimpleHistoryLogger ( to HDFS ) production-ready and tez should
allow publishing to multiple loggers.
This history only needs to capture the final state of the DAG, its tasks and
counters. It does not need to capture intermediate data. I am not sure
SimpleHistoryLogger in its current form is a good fit. The job history in MR is
in avro format and gives the whole state of the job on its completion. If AM
has that in memory, then we can have a config to dump that into HDFS in some
format (json/avro) which is the easiest thing. Else will need another Logger to
- build the state over time (not preferrable as it will consume lot of
memory) and dump on completion.
- or write events as it happens, then parse it and construct only relevant
information and write another file.
Both options with another Logger are not efficient and I don't like the idea
myself.
[~jlowe]/[~jeagles] , Any better suggestions on how this can be done based on
your experience with how it is currently done in MR?
> DAG history in HDFS
> -------------------
>
> Key: TEZ-2319
> URL: https://issues.apache.org/jira/browse/TEZ-2319
> Project: Apache Tez
> Issue Type: New Feature
> Reporter: Rohini Palaniswamy
>
> We have processes, that parse jobconf.xml and job history details (map and
> reduce task details, etc) in avro files from HDFS and load them into hive
> tables for analysis for mapreduce jobs. Would like to have Tez also make this
> information written to a history file in HDFS when AM or each DAG completes
> so that we can do analytics on Tez jobs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)