[
https://issues.apache.org/jira/browse/MAPREDUCE-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314972#comment-14314972
]
Craig Welch commented on MAPREDUCE-3973:
----------------------------------------
[~revans2], for what it's worth I strongly second the notion that the job
history server should be using a database of some sort for it's storage layer,
something which offers long term storage of a significant amount of data, the
full job history we want to retain, and which handles the trade off between
fast in-memory access and slower persistent storage for the service such that
the service does not have to have a custom version of this logic. Whether it's
relational as suggested or some other datastore (some combination of embedded
leveldb + hbase, perhaps), in general this is a difficult problem which I think
is better addressed by leveraging existing solutions (e.g. databases, as you
suggest) then trying to continue to support the custom model we have. By
moving this state into a datastore we could also support multiple jobhistory
servers for scaling with little to no effort, as you suggest, which would be a
major win.
TL;DR +1
> [Umbrella JIRA] JobHistoryServer performance improvements in YARN+MR
> --------------------------------------------------------------------
>
> Key: MAPREDUCE-3973
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3973
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: jobhistoryserver, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
>
> Few parallel efforts are happening w.r.t improving/fixing issues with
> JobHistoryServer in MR over YARN. This is the umbrella ticket so we have the
> complete picture.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)