[ 
https://issues.apache.org/jira/browse/FLINK-14048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957670#comment-16957670
 ] 

Gyula Fora commented on FLINK-14048:
------------------------------------

[~tison] I think your patch doesn't fix the problem. It looks like a bug in the 
AbstractYarnClusterDescriptor when it tries to kill the already failed app.

 

19/10/23 01:50:50 INFO yarn.AbstractYarnClusterDescriptor: Cancelling 
deployment from Deployment Failure Hook
19/10/23 01:50:50 INFO yarn.AbstractYarnClusterDescriptor: Killing YARN 
application
19/10/23 01:50:50 INFO retry.RetryInvocationHandler: java.io.IOException: The 
client is stopped, while invoking 
ApplicationClientProtocolPBClientImpl.forceKillApplication over null. Trying to 
failover immediately.
19/10/23 01:50:50 INFO retry.RetryInvocationHandler: java.io.IOException: The 
client is stopped, while invoking 
ApplicationClientProtocolPBClientImpl.forceKillApplication over null after 1 
failover attempts. Trying to failover after sleeping for 40495ms.

> Flink client hangs after trying to kill Yarn Job during deployment
> ------------------------------------------------------------------
>
>                 Key: FLINK-14048
>                 URL: https://issues.apache.org/jira/browse/FLINK-14048
>             Project: Flink
>          Issue Type: Improvement
>          Components: Client / Job Submission, Deployment / YARN
>            Reporter: Gyula Fora
>            Priority: Major
>         Attachments: patch.diff
>
>
> If we kill the flink client run command from the terminal while deploying to 
> YARN (let's say we realize we used the wrong parameters), the YARN 
> application will be killed immediately but the client won't shut down.
> We get the following messages over and over:
> 19/09/10 23:35:55 INFO retry.RetryInvocationHandler: java.io.IOException: The 
> client is stopped, while invoking 
> ApplicationClientProtocolPBClientImpl.forceKillApplication over null after 14 
> failover attempts. Trying to failover after sleeping for 16296ms.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to