[ 
https://issues.apache.org/jira/browse/FLINK-30908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684553#comment-17684553
 ] 

Matthias Pohl edited comment on FLINK-30908 at 2/6/23 9:52 AM:
---------------------------------------------------------------

Not sure, yet, whether that's related but there's a 
{{ApplicationAttemptNotFoundException}} which causes application 
{{application_1675564836997_0002}} to be killed:
{code}
02:41:17,442 [IPC Server handler 9 on default port 46716] INFO  
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl 
[] - Stopping container with container Id: 
container_1675564836997_0002_01_000002
02:41:17,458 [IPC Server handler 2 on default port 45213] ERROR 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService [] - 
Application attempt appattempt_1675564836997_0002_000001 doesn't exist in 
ApplicationMasterService cache.
02:41:17,459 [IPC Server handler 2 on default port 45213] INFO  
org.apache.hadoop.ipc.Server                                 [] - IPC Server 
handler 2 on default port 45213, call Call#8 Retry#0 
org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allocate from 
192.168.144.2:35386
org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: 
Application attempt appattempt_1675564836997_0002_000001 doesn't exist in 
ApplicationMasterService cache.
        at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:407)
 ~[hadoop-yarn-server-resourcemanager-3.2.3.jar:?]
        at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
 ~[hadoop-yarn-common-3.2.3.jar:?]
        at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
 ~[hadoop-yarn-api-3.2.3.jar:?]
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:549)
 ~[hadoop-common-3.2.3.jar:?]
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:518)
 ~[hadoop-common-3.2.3.jar:?]
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) 
~[hadoop-common-3.2.3.jar:?]
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029) 
[hadoop-common-3.2.3.jar:?]
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957) 
[hadoop-common-3.2.3.jar:?]
        at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_292]
        at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_292]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
 [hadoop-common-3.2.3.jar:?]
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957) 
[hadoop-common-3.2.3.jar:?]
{code}


was (Author: mapohl):
Not sure, yet, whether that's related but there's a 
{{ApplicationAttemptNotFoundException}}:
{code}
02:41:17,442 [IPC Server handler 9 on default port 46716] INFO  
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl 
[] - Stopping container with container Id: 
container_1675564836997_0002_01_000002
02:41:17,458 [IPC Server handler 2 on default port 45213] ERROR 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService [] - 
Application attempt appattempt_1675564836997_0002_000001 doesn't exist in 
ApplicationMasterService cache.
02:41:17,459 [IPC Server handler 2 on default port 45213] INFO  
org.apache.hadoop.ipc.Server                                 [] - IPC Server 
handler 2 on default port 45213, call Call#8 Retry#0 
org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allocate from 
192.168.144.2:35386
org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: 
Application attempt appattempt_1675564836997_0002_000001 doesn't exist in 
ApplicationMasterService cache.
        at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:407)
 ~[hadoop-yarn-server-resourcemanager-3.2.3.jar:?]
        at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
 ~[hadoop-yarn-common-3.2.3.jar:?]
        at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
 ~[hadoop-yarn-api-3.2.3.jar:?]
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:549)
 ~[hadoop-common-3.2.3.jar:?]
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:518)
 ~[hadoop-common-3.2.3.jar:?]
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) 
~[hadoop-common-3.2.3.jar:?]
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029) 
[hadoop-common-3.2.3.jar:?]
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957) 
[hadoop-common-3.2.3.jar:?]
        at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_292]
        at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_292]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
 [hadoop-common-3.2.3.jar:?]
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957) 
[hadoop-common-3.2.3.jar:?]
{code}

> Fatal error in ResourceManager caused 
> YARNSessionFIFOSecuredITCase.testDetachedMode to fail
> -------------------------------------------------------------------------------------------
>
>                 Key: FLINK-30908
>                 URL: https://issues.apache.org/jira/browse/FLINK-30908
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / YARN, Runtime / Coordination
>    Affects Versions: 1.17.0
>            Reporter: Matthias Pohl
>            Priority: Critical
>              Labels: test-stability
>
> There's a build failure in {{YARNSessionFIFOSecuredITCase.testDetachedMode}} 
> which is caused by a fatal error in the ResourceManager:
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=45720&view=logs&j=245e1f2e-ba5b-5570-d689-25ae21e5302f&t=d04c9862-880c-52f5-574b-a7a79fef8e0f&l=29869
> {code}
> Feb 05 02:41:58 java.io.InterruptedIOException: Interrupted waiting to send 
> RPC request to server
> Feb 05 02:41:58 java.io.InterruptedIOException: Interrupted waiting to send 
> RPC request to server
> Feb 05 02:41:58       at org.apache.hadoop.ipc.Client.call(Client.java:1480) 
> ~[hadoop-common-3.2.3.jar:?]
> Feb 05 02:41:58       at org.apache.hadoop.ipc.Client.call(Client.java:1422) 
> ~[hadoop-common-3.2.3.jar:?]
> Feb 05 02:41:58       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>  ~[hadoop-common-3.2.3.jar:?]
> Feb 05 02:41:58       at com.sun.proxy.$Proxy31.allocate(Unknown Source) 
> ~[?:?]
> Feb 05 02:41:58       at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>  ~[hadoop-yarn-common-3.2.3.jar:?]
> Feb 05 02:41:58       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) ~[?:1.8.0_292]
> Feb 05 02:41:58       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_292]
> Feb 05 02:41:58       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_292]
> Feb 05 02:41:58       at java.lang.reflect.Method.invoke(Method.java:498) 
> ~[?:1.8.0_292]
> Feb 05 02:41:58       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>  ~[hadoop-common-3.2.3.jar:?]
> Feb 05 02:41:58       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>  ~[hadoop-common-3.2.3.jar:?]
> Feb 05 02:41:58       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>  ~[hadoop-common-3.2.3.jar:?]
> Feb 05 02:41:58       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>  ~[hadoop-common-3.2.3.jar:?]
> Feb 05 02:41:58       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>  ~[hadoop-common-3.2.3.jar:?]
> Feb 05 02:41:58       at com.sun.proxy.$Proxy32.allocate(Unknown Source) 
> ~[?:?]
> Feb 05 02:41:58       at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:325)
>  ~[hadoop-yarn-client-3.2.3.jar:?]
> Feb 05 02:41:58       at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:311)
>  [hadoop-yarn-client-3.2.3.jar:?]
> Feb 05 02:41:58 Caused by: java.lang.InterruptedException
> Feb 05 02:41:58       at 
> java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) ~[?:1.8.0_292]
> Feb 05 02:41:58       at 
> java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:1.8.0_292]
> Feb 05 02:41:58       at 
> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1180) 
> ~[hadoop-common-3.2.3.jar:?]
> Feb 05 02:41:58       at org.apache.hadoop.ipc.Client.call(Client.java:1475) 
> ~[hadoop-common-3.2.3.jar:?]
> Feb 05 02:41:58       ... 17 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to