[
https://issues.apache.org/jira/browse/MAPREDUCE-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428995#comment-13428995
]
Bikas Saha commented on MAPREDUCE-4326:
---------------------------------------
I think the current implementation (actual code/commented code/todo's etc)
looks like a prototype which may not be in sync with the current state of the
functional code. So I am not sure about using it as is.
Also, the implementation seems to be doing blocking calls to ZK etc and will
likely end up being a bottleneck on RM threads/perf if a lot of state
information needs to be synced to stable store.
On that note, my gut feeling is that the RM state in practice is, in a sense,
the sum total of the current state of the cluster as reflected in the NM's. So
there may not be the need to store any state as long as the RM can recover the
current state of the cluster from the NM's in a reasonable amount of time. The
NM's anyways have to re-sync with the RM after it comes back up. So that is not
extra overhead.
Saving a lot of state would result in having to solve the same set of issues
that the Namenode has to solve in order to maintain consistent, reliable and
available saved state. IMO, for the RM we are better off avoiding those issues.
The only state that needs to be save, as far as I can see, is the information
about all jobs that are not yet completed. This information is present only in
the RM and so needs to be preserved across RM restart. Fortunately, this
information is small and infrequently updated. So saving it synchronously in ZK
may not be too much of an issue.
> Resurrect RM Restart
> ---------------------
>
> Key: MAPREDUCE-4326
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4326
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, resourcemanager
> Affects Versions: 2.0.0-alpha
> Reporter: Arun C Murthy
> Assignee: Bikas Saha
> Attachments: MR-4343.1.patch
>
>
> We should resurrect 'RM Restart' which we disabled sometime during the RM
> refactor.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira