[
https://issues.apache.org/jira/browse/SPARK-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kashish Jain updated SPARK-5613:
--------------------------------
Description:
Steps to Reproduce
1) Run any spark job
2) Stop yarn while the spark job is running (an application id has been
generated by now)
3) Restart yarn now
4) AsyncMonitorApplication thread fails due to ApplicationNotFoundException
exception. This leads to termination of thread.
Here is the StackTrace
15/02/05 05:22:37 INFO Client: Retrying connect to server:
nn1/192.168.173.176:8032. Already tried 6 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/02/05 05:22:38 INFO Client: Retrying connect to server:
nn1/192.168.173.176:8032. Already tried 7 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/02/05 05:22:39 INFO Client: Retrying connect to server:
nn1/192.168.173.176:8032. Already tried 8 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/02/05 05:22:40 INFO Client: Retrying connect to server:
nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
5/02/05 05:22:40 INFO Client: Retrying connect to server:
nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
Exception in thread "Yarn application state monitor"
org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application
with id 'application_1423113179043_0003' doesn't exist in RM.
at
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
at
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
at
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
at
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy12.getApplicationReport(Unknown Source)
at
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
at
org.apache.spark.deploy.yarn.Client.getApplicationReport(Client.scala:116)
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:120)
Caused by:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException):
Application with id 'application_1423113179043_0003' doesn't exist in RM.
at
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
at
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
at
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy11.getApplicationReport(Unknown Source)
at
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:163)
... 9 more
was:
Steps to Reproduce
1) Run any spark job
2) Stop yarn while the spark job is running (an application id has been
generated by now)
3) Restart yarn now
4) AsyncMonitorApplication thread fails due to ApplicationNotFoundException
exception. This leads to termination of thread.
> YarnClientSchedulerBackend fails to get application report when yarn restarts
> -----------------------------------------------------------------------------
>
> Key: SPARK-5613
> URL: https://issues.apache.org/jira/browse/SPARK-5613
> Project: Spark
> Issue Type: Bug
> Affects Versions: 1.2.0
> Reporter: Kashish Jain
> Priority: Minor
> Fix For: 1.2.0, 1.2.1
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> Steps to Reproduce
> 1) Run any spark job
> 2) Stop yarn while the spark job is running (an application id has been
> generated by now)
> 3) Restart yarn now
> 4) AsyncMonitorApplication thread fails due to ApplicationNotFoundException
> exception. This leads to termination of thread.
> Here is the StackTrace
> 15/02/05 05:22:37 INFO Client: Retrying connect to server:
> nn1/192.168.173.176:8032. Already tried 6 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> 15/02/05 05:22:38 INFO Client: Retrying connect to server:
> nn1/192.168.173.176:8032. Already tried 7 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> 15/02/05 05:22:39 INFO Client: Retrying connect to server:
> nn1/192.168.173.176:8032. Already tried 8 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> 15/02/05 05:22:40 INFO Client: Retrying connect to server:
> nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> 5/02/05 05:22:40 INFO Client: Retrying connect to server:
> nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> Exception in thread "Yarn application state monitor"
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application
> with id 'application_1423113179043_0003' doesn't exist in RM.
> at
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
> at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
> at
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Unknown Source)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
> Source)
> at java.lang.reflect.Constructor.newInstance(Unknown Source)
> at
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> at
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
> at
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
> at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> at java.lang.reflect.Method.invoke(Unknown Source)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at com.sun.proxy.$Proxy12.getApplicationReport(Unknown Source)
> at
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
> at
> org.apache.spark.deploy.yarn.Client.getApplicationReport(Client.scala:116)
> at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:120)
> Caused by:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException):
> Application with id 'application_1423113179043_0003' doesn't exist in RM.
> at
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
> at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
> at
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Unknown Source)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> at org.apache.hadoop.ipc.Client.call(Client.java:1410)
> at org.apache.hadoop.ipc.Client.call(Client.java:1363)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy11.getApplicationReport(Unknown Source)
> at
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:163)
> ... 9 more
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]