[
https://issues.apache.org/jira/browse/MAPREDUCE-6337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sangjin Lee updated MAPREDUCE-6337:
-----------------------------------
Attachment: MAPREDUCE-6337-YARN-2928.002.patch
Patch v.2 posted.
Changes:
- added the replay mode (1: write all entities at once, 2: write one entity at
a time)
- used the timeline collector manager to provide the writer to the timeline
collectors
- refactored the entity writers to provide the base functionality
With regards to writing one per event, since this is based on the data
structures generated by the job history file parser, there is no easy access to
the job history events. Trying to reverse-engineer the events from the info
would be considerable effort. Hopefully, writing one entity at a time generates
enough chattiness for the write performance. Let me know what you think.
Also, on the point of creating timeline entities out of job history, yes, I do
agree that there may be benefits in a shared tool for creating timeline
entities. But as you point out, the job history code bases it on events whereas
the job history parser provides *info classes that are rather different. Since
this is test code, I think it is OK to be "close enough". If I'm missing some
important data, I can add them to the entities.
> add a mode to replay MR job history files to the timeline service
> -----------------------------------------------------------------
>
> Key: MAPREDUCE-6337
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6337
> Project: Hadoop Map/Reduce
> Issue Type: Sub-task
> Reporter: Sangjin Lee
> Assignee: Sangjin Lee
> Attachments: MAPREDUCE-6337-YARN-2928.001.patch,
> MAPREDUCE-6337-YARN-2928.002.patch
>
>
> The subtask covers the work on top of YARN-3437 to add a mode to replay MR
> job history files to the timeline service storage.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)