[ 
https://issues.apache.org/jira/browse/SPARK-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kashish Jain updated SPARK-5613:
--------------------------------
    Description: 
Steps to Reproduce
1) Run any spark job
2) Stop yarn while the spark job is running (an application id has been 
generated by now)
3) Restart yarn now
4) AsyncMonitorApplication thread fails due to ApplicationNotFoundException 
exception. This leads to termination of thread. 


Here is the StackTrace

15/02/05 05:22:37 INFO Client: Retrying connect to server: 
nn1/192.168.173.176:8032. Already tried 6 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/02/05 05:22:38 INFO Client: Retrying connect to server: 
nn1/192.168.173.176:8032. Already tried 7 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/02/05 05:22:39 INFO Client: Retrying connect to server: 
nn1/192.168.173.176:8032. Already tried 8 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/02/05 05:22:40 INFO Client: Retrying connect to server: 
nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
5/02/05 05:22:40 INFO Client: Retrying connect to server: 
nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

Exception in thread "Yarn application state monitor" 
org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
with id 'application_1423113179043_0003' doesn't exist in RM.
        at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
        at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
        at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Unknown Source)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
        at 
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
        at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
        at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
        at com.sun.proxy.$Proxy12.getApplicationReport(Unknown Source)
        at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
        at 
org.apache.spark.deploy.yarn.Client.getApplicationReport(Client.scala:116)
        at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:120)
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException):
 Application with id 'application_1423113179043_0003' doesn't exist in RM.
        at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
        at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
        at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Unknown Source)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

        at org.apache.hadoop.ipc.Client.call(Client.java:1410)
        at org.apache.hadoop.ipc.Client.call(Client.java:1363)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy11.getApplicationReport(Unknown Source)
        at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:163)
        ... 9 more


  was:
Steps to Reproduce
1) Run any spark job
2) Stop yarn while the spark job is running (an application id has been 
generated by now)
3) Restart yarn now
4) AsyncMonitorApplication thread fails due to ApplicationNotFoundException 
exception. This leads to termination of thread. 


> YarnClientSchedulerBackend fails to get application report when yarn restarts
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-5613
>                 URL: https://issues.apache.org/jira/browse/SPARK-5613
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Kashish Jain
>            Priority: Minor
>             Fix For: 1.2.0, 1.2.1
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Steps to Reproduce
> 1) Run any spark job
> 2) Stop yarn while the spark job is running (an application id has been 
> generated by now)
> 3) Restart yarn now
> 4) AsyncMonitorApplication thread fails due to ApplicationNotFoundException 
> exception. This leads to termination of thread. 
> Here is the StackTrace
> 15/02/05 05:22:37 INFO Client: Retrying connect to server: 
> nn1/192.168.173.176:8032. Already tried 6 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 15/02/05 05:22:38 INFO Client: Retrying connect to server: 
> nn1/192.168.173.176:8032. Already tried 7 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 15/02/05 05:22:39 INFO Client: Retrying connect to server: 
> nn1/192.168.173.176:8032. Already tried 8 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 15/02/05 05:22:40 INFO Client: Retrying connect to server: 
> nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 5/02/05 05:22:40 INFO Client: Retrying connect to server: 
> nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> Exception in thread "Yarn application state monitor" 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1423113179043_0003' doesn't exist in RM.
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
>       at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Unknown Source)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
>       at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
> Source)
>       at java.lang.reflect.Constructor.newInstance(Unknown Source)
>       at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>       at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
>       at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
>       at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>       at java.lang.reflect.Method.invoke(Unknown Source)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>       at com.sun.proxy.$Proxy12.getApplicationReport(Unknown Source)
>       at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
>       at 
> org.apache.spark.deploy.yarn.Client.getApplicationReport(Client.scala:116)
>       at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:120)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException):
>  Application with id 'application_1423113179043_0003' doesn't exist in RM.
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
>       at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Unknown Source)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>       at com.sun.proxy.$Proxy11.getApplicationReport(Unknown Source)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:163)
>       ... 9 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to