[
https://issues.apache.org/jira/browse/MAPREDUCE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132376#comment-13132376
]
Devaraj K commented on MAPREDUCE-3178:
--------------------------------------
Hi Kamesh/Arun,
{code}
+ int time = conf.getInt(YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS,
+ YarnConfiguration.DEFAULT_RM_NM_EXPIRY_INTERVAL_MS);
+ String msg = "Duplicate registration from the node!";
+ LOG.info(msg + " Waiting " + time + " ms, for registration.");
+ try {
+ Thread.sleep(time);
+ } catch (InterruptedException e) {
+ }
{code}
I think it is not a good idea to make the registration process sleep for
YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS time when the node manager goes down
and comes up before the expiry interval. By default this value is 10 mins.
During this time node manager will not be able to serve any request.
I also saw the same issue for any scheduler and commented the same in
MAPREDUCE-3070, trying to solve as part of that.
https://issues.apache.org/jira/browse/MAPREDUCE-3070?focusedCommentId=13125711&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13125711
> Capacity Schedular shows incorrect cluster information in the RM logs
> ---------------------------------------------------------------------
>
> Key: MAPREDUCE-3178
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3178
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Affects Versions: 0.23.0
> Reporter: Bhallamudi Venkata Siva Kamesh
> Assignee: Bhallamudi Venkata Siva Kamesh
> Priority: Blocker
> Attachments: MAPREDUCE-3178.patch
>
>
> When we start the NM, after stopping it (in a quick session) CS shows
> incorrect information about clusterResource in the logs.
> I have encountered this issue in a pseudo cluster mode and steps to reproduce
> are
> 1) start the YARN cluster
> 2) stop a NM and start the NM again (in a quick session)
> There should be a NM running in the cluster however as I observed RM detects
> NM as dead, after default time since its actual unavailability(In this case
> NM has been stopped).
>
> If you start your NM before this time (default time), ResourceTracker throws
> IOEx, however, CS adds the NM's capacity to the clusterResource.
> After elapsed time (default time) when RM detects NM as dead, RM removes the
> NM and hence capacity of the cluster will be subtracted by the amount NM
> capacity.
> Eventually there is no NM running in the cluster, but capacity of the cluster
> is NM's capacity (by default)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira