[
https://issues.apache.org/jira/browse/MAPREDUCE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418288#comment-13418288
]
Jason Lowe commented on MAPREDUCE-4460:
---------------------------------------
The issue occurs because new queue instances are created and registered with
the metrics system during the first refreshQueues before the exception is
thrown. The new queue instances are discarded due to the bad configuration,
but they are left registered with the metrics system. When the refreshQueues
occurs the second time it creates new queue instances again, since the old ones
were discarded. When the new instances try to register with the metrics
system, the metrics system sees a duplicate registration, throws an exception,
and spoils the refreshQueues.
We need to deregister any newly created queue instances from the metrics system
when recovering from a bad configuration.
> Refresh queue throws IO exception after configuring wrong queue capacity
> ------------------------------------------------------------------------
>
> Key: MAPREDUCE-4460
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4460
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Affects Versions: 2.1.0-alpha
> Reporter: Nishan Shetty
>
> Scenario:
> 1.My setup has a,b queues(each with capacity say 50%) under root queue
> 2.Start the process
> 3.Add one more queue 'c' under root
> 4.Configure some capacity for 'c' such that total capacity of a,b,c is not
> equal to 100
> 5.Now do refresh queues, it will throw exception as wrong capacity(This is
> expected as capacity was not equal to 100).
> 6.Now reconfigure queue capacities of a,b,c such that total capacity is 100
> 5.Now do refresh queues again
> Observed that it throws IO exception
> {noformat}
> java.io.IOException: Failed to re-init queues
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:216)
> at
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:174)
> at
> org.apache.hadoop.yarn.server.resourcemanager.api.impl.pb.service.RMAdminProtocolPBServiceImpl.refreshQueues(RMAdminProtocolPBServiceImpl.java:62)
> at
> org.apache.hadoop.yarn.proto.RMAdminProtocol$RMAdminProtocolService$2.callBlockingMethod(RMAdminProtocol.java:122)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686)
> Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics source
> QueueMetrics,q0=root,q1=c already exists!
> at
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
> at
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
> at
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:216)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.forQueue(QueueMetrics.java:129)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.forQueue(QueueMetrics.java:119)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:136)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:313)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:328)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:246)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:213)
> ... 11 more
> at LocalTrace:
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl:
> Failed to re-init queues
> at
> org.apache.hadoop.yarn.factories.impl.pb.YarnRemoteExceptionFactoryPBImpl.createYarnRemoteException(YarnRemoteExceptionFactoryPBImpl.java:50)
> at
> org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:40)
> at
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:184)
> at
> org.apache.hadoop.yarn.server.resourcemanager.api.impl.pb.service.RMAdminProtocolPBServiceImpl.refreshQueues(RMAdminProtocolPBServiceImpl.java:62)
> at
> org.apache.hadoop.yarn.proto.RMAdminProtocol$RMAdminProtocolService$2.callBlockingMethod(RMAdminProtocol.java:122)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686)
> Caused by:
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Metrics
> source QueueMetrics,q0=root,q1=c already exists!
> at
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.getCause(YarnRemoteExceptionPBImpl.java:94)
> at
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.getCause(YarnRemoteExceptionPBImpl.java:32)
> at java.lang.Throwable.printStackTrace(Throwable.java:514)
> at
> org.apache.hadoop.yarn.exceptions.YarnRemoteException.printStackTrace(YarnRemoteException.java:48)
> at
> org.apache.hadoop.util.StringUtils.stringifyException(StringUtils.java:69)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1715)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira