Daryn Sharp created MAPREDUCE-5262: -------------------------------------- Summary: AM generates NPEs when RM connection fails Key: MAPREDUCE-5262 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5262 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.0.4-alpha, 3.0.0 Reporter: Daryn Sharp
If the AM fails to connect to the RM, it causes a cascade of NPEs as the AM attempts to shutdown and exit. {noformat} 2013-05-21 00:31:56,153 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoopqa (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Password not found for ApplicationAttempt appattempt_1367605529307_0034_000001 2013-05-21 00:31:56,154 WARN [main] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Password not found for ApplicationAttempt appattempt_1367605529307_0034_000001 2013-05-21 00:31:56,154 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoopqa (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Password not found for ApplicationAttempt appattempt_1367605529307_0034_000001 2013-05-21 00:31:56,156 ERROR [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Exception while registering java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:128) at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.registerApplicationMaster(AMRMProtocolPBClientImpl.java:103) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:153) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.start(RMCommunicator.java:112) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.start(RMContainerAllocator.java:211) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.start(MRAppMaster.java:797) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:1014) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1374) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1370) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1318) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Password not found for ApplicationAttempt appattempt_1367605529307_0034_000001 at org.apache.hadoop.ipc.Client.call(Client.java:1266) at org.apache.hadoop.ipc.Client.call(Client.java:1218) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy28.registerApplicationMaster(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.registerApplicationMaster(AMRMProtocolPBClientImpl.java:100) ... 12 more 2013-05-21 00:31:56,158 ERROR [main] org.apache.hadoop.yarn.service.CompositeService: Error starting services org.apache.hadoop.mapreduce.v2.app.MRAppMaster org.apache.hadoop.yarn.YarnException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:166) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.start(RMCommunicator.java:112) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.start(RMContainerAllocator.java:211) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.start(MRAppMaster.java:797) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:1014) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1374) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1370) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1318) Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:128) at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.registerApplicationMaster(AMRMProtocolPBClientImpl.java:103) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:153) ... 11 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Password not found for ApplicationAttempt appattempt_1367605529307_0034_000001 at org.apache.hadoop.ipc.Client.call(Client.java:1266) at org.apache.hadoop.ipc.Client.call(Client.java:1218) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy28.registerApplicationMaster(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.registerApplicationMaster(AMRMProtocolPBClientImpl.java:100) ... 12 more 2013-05-21 00:31:56,158 INFO [main] org.apache.hadoop.yarn.service.CompositeService: Error stopping org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.stop(RMCommunicator.java:219) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.stop(RMContainerAllocator.java:251) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.stop(MRAppMaster.java:803) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:77) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:1014) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1374) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1370) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1318) 2013-05-21 00:31:56,158 INFO [main] org.apache.hadoop.ipc.Server: Stopping server on 39121 2013-05-21 00:31:56,160 INFO [main] org.apache.hadoop.yarn.service.AbstractService: Service:TaskHeartbeatHandler is stopped. 2013-05-21 00:31:56,160 INFO [IPC Server listener on 39121] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 39121 2013-05-21 00:31:56,160 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2013-05-21 00:31:56,160 INFO [TaskHeartbeatHandler PingChecker] org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler: TaskHeartbeatHandler thread interrupted 2013-05-21 00:31:56,160 INFO [main] org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.mapred.TaskAttemptListenerImpl is stopped. 2013-05-21 00:31:56,161 INFO [main] org.apache.hadoop.yarn.service.AbstractService: Service:CommitterEventHandler is stopped. 2013-05-21 00:31:56,161 INFO [main] org.apache.hadoop.ipc.Server: Stopping server on 50500 2013-05-21 00:31:56,161 INFO [IPC Server listener on 50500] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 50500 2013-05-21 00:31:56,161 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2013-05-21 00:31:56,164 INFO [main] org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:0 2013-05-21 00:31:56,264 INFO [main] org.apache.hadoop.yarn.service.AbstractService: Service:MRClientService is stopped. 2013-05-21 00:31:56,264 INFO [main] org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is stopped. 2013-05-21 00:31:56,264 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster org.apache.hadoop.yarn.YarnException: Failed to Start org.apache.hadoop.mapreduce.v2.app.MRAppMaster at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:1014) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1374) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1370) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1318) Caused by: org.apache.hadoop.yarn.YarnException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:166) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.start(RMCommunicator.java:112) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.start(RMContainerAllocator.java:211) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.start(MRAppMaster.java:797) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) ... 7 more Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:128) at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.registerApplicationMaster(AMRMProtocolPBClientImpl.java:103) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:153) ... 11 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Password not found for ApplicationAttempt appattempt_1367605529307_0034_000001 at org.apache.hadoop.ipc.Client.call(Client.java:1266) at org.apache.hadoop.ipc.Client.call(Client.java:1218) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy28.registerApplicationMaster(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.registerApplicationMaster(AMRMProtocolPBClientImpl.java:100) ... 12 more 2013-05-21 00:31:56,266 INFO [Thread-1] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a signal. Signaling RMCommunicator and JobHistoryEventHandler. 2013-05-21 00:31:56,266 INFO [Thread-1] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: RMCommunicator notified that iSignalled is: true 2013-05-21 00:31:56,266 INFO [Thread-1] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator isAMLastRetry: false 2013-05-21 00:31:56,266 INFO [Thread-1] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: RMCommunicator notified that shouldUnregistered is: false 2013-05-21 00:31:56,267 INFO [Thread-1] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify JHEH isAMLastRetry: false 2013-05-21 00:31:56,267 INFO [Thread-1] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: JobHistoryEventHandler notified that forceJobCompletion is false 2013-05-21 00:31:56,267 INFO [Thread-1] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopping JobHistoryEventHandler. Size of the outstanding queue size is 3 2013-05-21 00:31:56,267 INFO [Thread-1] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: In stop, writing event AM_STARTED 2013-05-21 00:31:56,347 INFO [Thread-1] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer setup for JobId: job_1367605529307_0035, File: hdfs://hdfs-server:8020/user/hadoopqa/.staging/job_1367605529307_0035/job_1367605529307_0035_2.jhist 2013-05-21 00:31:56,356 WARN [Thread-1] org.apache.hadoop.conf.Configuration: user.name is deprecated. Instead, use mapreduce.job.user.name 2013-05-21 00:31:56,570 INFO [Thread-1] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: In stop, writing event AM_STARTED 2013-05-21 00:31:56,571 INFO [Thread-1] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: In stop, writing event JOB_SUBMITTED 2013-05-21 00:31:56,588 INFO [Thread-1] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop() 2013-05-21 00:31:56,588 INFO [Thread-1] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Skipping cleaning up the staging dir. assuming AM will be retried. 2013-05-21 00:31:56,588 INFO [Thread-1] org.apache.hadoop.yarn.service.CompositeService: Error stopping org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter.stop(MRAppMaster.java:865) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1343) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) 2013-05-21 00:31:56,588 INFO [Thread-1] org.apache.hadoop.ipc.Server: Stopping server on 39121 2013-05-21 00:31:56,588 INFO [Thread-1] org.apache.hadoop.ipc.Server: Stopping server on 50500 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira