[
https://issues.apache.org/jira/browse/TEZ-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706543#comment-14706543
]
Rajesh Balamohan commented on TEZ-2628:
---------------------------------------
Tested this on small scale cluster. It works fine, but has some issues when
trying to download data from ATS. There are 2 ways in which data can be
downloaded from ATS for tez. 1. via UI "download data" button. 2. via
"ATSImportTool" which is a command line utility.
In both cases, the data downloaded via ATS (with 1.5 patch YARN-3942) does not
have the complete information. If "fromId" is specified in the URL, it provides
data in random pattern. (e.g
http://atsmachine:8188/ws/v1/timeline/TEZ_TASK_ID?limit=3&primaryFilter=TEZ_VERTEX_ID:vertex_1439860407967_0054_1_11&fromId=task_1439860407967_0054_1_11_000420
will return different data than
http://atsmachine:8188/ws/v1/timeline/TEZ_TASK_ID?limit=3&primaryFilter=TEZ_VERTEX_ID:vertex_1439860407967_0054_1_11).
So if there is pagination involved (e.g downloading 100 tasks at a time), it
runs into issues, where it would not be able to download complete data.
It is possible that with YARN-3942, it ends up using MemoryTimelineStore where
the getEntities impl is different than in LevelDBStore. This could possibly be
causing the issue, but not too sure.
Alternate workaround would be to specify limit=100000, so that it would
download all tasks in a single fetch, but not sure if leveldb would impose any
restrictions by default on the limits. TEZ-UI does not have the issue, as it
adds some really high values for "limit" during first pull.
> History logging plugin to write ATS events to HDFS
> --------------------------------------------------
>
> Key: TEZ-2628
> URL: https://issues.apache.org/jira/browse/TEZ-2628
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Attachments: TEZ-2628.001.patch, TEZ-2628.002.patch,
> hive-timeline.json
>
>
> This provides another history logging alternative that conceptually the same
> as the timeline logging service but logs the entities to a file rather than
> posting the events to the timeline server directly. When coupled with the
> timeline store plugin from YARN-3942 it allows the Tez job to be decoupled
> from the timeline server yet the Tez UI can still function properly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)