[
https://issues.apache.org/jira/browse/MAPREDUCE-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430935#comment-13430935
]
Tsuyoshi OZAWA commented on MAPREDUCE-4326:
-------------------------------------------
> So there may not be the need to store any state as long as the RM can recover
> the current state of the cluster from the NM's in a reasonable amount of
> time.
It's good idea to avoid saving recoverable states without storing. It's
uncertain that it can be recoverable in a reasonable amount of time, so
prototyping is needed.
> The only state that needs to be save, as far as I can see, is the information
> about all jobs that are not yet completed.
I agree with you. I'll check whether the states of WIP jobs is defined
correctly or not.
> Also, the implementation seems to be doing blocking calls to ZK etc and will
> likely end up being a bottleneck on RM threads/perf if a lot of state
> information needs to be synced to stable store.
I think, to avoid being the bottleneck, RM should have a dedicated thread to
save the states of RM. The main thread can send the requests of saving the
states to the dedicated thread without blocking by using queue or something.
Using async APIs to save the states is also effective, however, the code can
get complicated.
> Resurrect RM Restart
> ---------------------
>
> Key: MAPREDUCE-4326
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4326
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, resourcemanager
> Affects Versions: 2.0.0-alpha
> Reporter: Arun C Murthy
> Assignee: Bikas Saha
> Attachments: MR-4343.1.patch
>
>
> We should resurrect 'RM Restart' which we disabled sometime during the RM
> refactor.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira