[
https://issues.apache.org/jira/browse/TEZ-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494576#comment-14494576
]
Hitesh Shah commented on TEZ-2319:
----------------------------------
[~rohini] I think there are 2 options:
1) Support an export/dump tool from Timeline over a specified time range for
entities matching certain criteria. I believe other tools such as Inviso that
also make use of MR history would benefit from this. I think this is the better
long term bet given that most applications will be publishing to Timeline and
an export tool for historical analysis of data will likely be needed.
2) make the SimpleHistoryLogger ( to HDFS ) production-ready and tez should
allow publishing to multiple loggers. Currently, it is only partially useful as
an experimental feature which is human readable. It is not fully in sync with
the data dumped to ATS. It is a mirror of ats entities and does not need to be
so. The data could be stored in a better format and support compression in
addition to other improvements.
> DAG history in HDFS
> -------------------
>
> Key: TEZ-2319
> URL: https://issues.apache.org/jira/browse/TEZ-2319
> Project: Apache Tez
> Issue Type: New Feature
> Reporter: Rohini Palaniswamy
>
> We have processes, that parse jobconf.xml and job history details (map and
> reduce task details, etc) in avro files from HDFS and load them into hive
> tables for analysis for mapreduce jobs. Would like to have Tez also make this
> information written to a history file in HDFS when AM or each DAG completes
> so that we can do analytics on Tez jobs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)