[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314972#comment-14314972
 ] 

Craig Welch commented on MAPREDUCE-3973:
----------------------------------------

[~revans2], for what it's worth I strongly second the notion that the job 
history server should be using a database of some sort for it's storage layer, 
something which offers long term storage of a significant amount of data, the 
full job history we want to retain, and which handles the trade off between 
fast in-memory access and slower persistent storage for the service such that 
the service does not have to have a custom version of this logic.  Whether it's 
relational as suggested or some other datastore (some combination of embedded 
leveldb + hbase, perhaps), in general this is a difficult problem which I think 
is better addressed by leveraging existing solutions (e.g. databases, as you 
suggest) then trying to continue to support the custom model we have.  By 
moving this state into a datastore we could also support multiple jobhistory 
servers for scaling with little to no effort, as you suggest, which would be a 
major win.  

TL;DR +1

> [Umbrella JIRA] JobHistoryServer performance improvements in YARN+MR
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3973
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3973
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>
> Few parallel efforts are happening w.r.t improving/fixing issues with 
> JobHistoryServer in MR over YARN. This is the umbrella ticket so we have the 
> complete picture.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to