[
https://issues.apache.org/jira/browse/FLINK-14048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957670#comment-16957670
]
Gyula Fora commented on FLINK-14048:
------------------------------------
[~tison] I think your patch doesn't fix the problem. It looks like a bug in the
AbstractYarnClusterDescriptor when it tries to kill the already failed app.
19/10/23 01:50:50 INFO yarn.AbstractYarnClusterDescriptor: Cancelling
deployment from Deployment Failure Hook
19/10/23 01:50:50 INFO yarn.AbstractYarnClusterDescriptor: Killing YARN
application
19/10/23 01:50:50 INFO retry.RetryInvocationHandler: java.io.IOException: The
client is stopped, while invoking
ApplicationClientProtocolPBClientImpl.forceKillApplication over null. Trying to
failover immediately.
19/10/23 01:50:50 INFO retry.RetryInvocationHandler: java.io.IOException: The
client is stopped, while invoking
ApplicationClientProtocolPBClientImpl.forceKillApplication over null after 1
failover attempts. Trying to failover after sleeping for 40495ms.
> Flink client hangs after trying to kill Yarn Job during deployment
> ------------------------------------------------------------------
>
> Key: FLINK-14048
> URL: https://issues.apache.org/jira/browse/FLINK-14048
> Project: Flink
> Issue Type: Improvement
> Components: Client / Job Submission, Deployment / YARN
> Reporter: Gyula Fora
> Priority: Major
> Attachments: patch.diff
>
>
> If we kill the flink client run command from the terminal while deploying to
> YARN (let's say we realize we used the wrong parameters), the YARN
> application will be killed immediately but the client won't shut down.
> We get the following messages over and over:
> 19/09/10 23:35:55 INFO retry.RetryInvocationHandler: java.io.IOException: The
> client is stopped, while invoking
> ApplicationClientProtocolPBClientImpl.forceKillApplication over null after 14
> failover attempts. Trying to failover after sleeping for 16296ms.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)