[
https://issues.apache.org/jira/browse/FLINK-24038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17407130#comment-17407130
]
Yang Wang commented on FLINK-24038:
-----------------------------------
[~trohrmann] Using a K8s job could help a bit when doing the deregistration
failed. However, we still have the residual TaskManager pods and flink
ConfigMaps. Maybe let the JobManager relaunched and recover the finished or
failed jobs, then the dispatcher will deregister the application again. It is
more reasonable.
For Yarn, I am afraid it is in the same situation. Even though the
JobManager(application master) exit with zero code, it will be launched again
when deregistering failed.
For the new options #1 [~xtsong] listed, do you mean let the leader dispatcher
do the deregistration? If it is, what will happen without leader.
> DispatcherResourceManagerComponent fails to deregister application if no
> leading ResourceManager
> ------------------------------------------------------------------------------------------------
>
> Key: FLINK-24038
> URL: https://issues.apache.org/jira/browse/FLINK-24038
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.14.0
> Reporter: Till Rohrmann
> Priority: Critical
> Fix For: 1.14.0
>
>
> With FLINK-21667 we introduced a change that can cause the
> {{DispatcherResourceManagerComponent}} to fail when trying to stop the
> application. The problem is that the {{DispatcherResourceManagerComponent}}
> needs a leading {{ResourceManager}} to successfully execute the
> stop/deregister application call. If this is not the case, then it will fail
> fatally. In the case of multiple standby JobManager processes it can happen
> that the leading {{ResourceManager}} runs somewhere else.
> I do see two possible solutions:
> 1. Run the leader election process for the whole JobManager process
> 2. Move the registration/deregistration of the application out of the
> {{ResourceManager}} so that it can be executed w/o a leader
--
This message was sent by Atlassian Jira
(v8.3.4#803005)