[
https://issues.apache.org/jira/browse/FLINK-26772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525812#comment-17525812
]
Matthias Pohl commented on FLINK-26772:
---------------------------------------
Summary of the investigation:
As stated by [~wangyang0918] we deregister the Flink cluster when shutting down
the cluster. This is triggered in Application Mode when the job reached a
terminate state. The deregistration causes k8s to delete the Flink deployment
which might happen while the cleanup is not done, yet.
The solution is to move the cluster shutdown to after the cleanup is done.
> Application Mode does not wait for job cleanup during shutdown
> --------------------------------------------------------------
>
> Key: FLINK-26772
> URL: https://issues.apache.org/jira/browse/FLINK-26772
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.15.0
> Reporter: Mika Naylor
> Assignee: Matthias Pohl
> Priority: Critical
> Labels: pull-request-available
> Attachments: FLINK-26772.standalone-job.log,
> testcluster-599f4d476b-bghw5_log.txt
>
>
> We discovered that in Application Mode, when the application has completed,
> the cluster is shutdown even if there are ongoing resource cleanup events
> happening in the background. For example, if ha cleanup fails, further
> retries are not attempted as the cluster is shut down before this can happen.
>
> We should also add a flag for the shutdown that will prevent further jobs
> from being submitted.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)