[
https://issues.apache.org/jira/browse/FLINK-26772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524296#comment-17524296
]
Matthias Pohl commented on FLINK-26772:
---------------------------------------
I tried to reproduce it with the standalone cluster but failed: The
{{ClusterEntrypoint}} process kept running until I resolved the cleanup issue
which is the expected behavior. The logs of the k8s run revealed a {{SIGTERM}}
which might be k8s-specific:
{code:java}
2022-03-21 09:34:41,129 INFO
org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager []
- Closing the slot manager.
2022-03-21 09:34:41,129 INFO
org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager []
- Suspending the slot manager.
2022-03-21 09:34:41,133 DEBUG
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor [] - The
RpcEndpoint resourcemanager_0 terminated successfully.
2022-03-21 09:34:41,136 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - RECEIVED
SIGNAL 15: SIGTERM. Shutting down as requested.
2022-03-21 09:34:41,136 DEBUG org.apache.flink.runtime.rpc.akka.SupervisorActor
[] - AkkaRpcActor akka://flink/user/rpc/resourcemanager_0 has
terminated.
2022-03-21 09:34:41,151 INFO org.apache.flink.runtime.blob.BlobServer
[] - Stopped BLOB server at 0.0.0.0:6124 {code}
[~yangwang166] do you have any guess what external process might have sent the
SIGTERM while the Application Mode cluster is shutting down?
> Application Mode does not wait for job cleanup during shutdown
> --------------------------------------------------------------
>
> Key: FLINK-26772
> URL: https://issues.apache.org/jira/browse/FLINK-26772
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.15.0
> Reporter: Mika Naylor
> Assignee: Mika Naylor
> Priority: Critical
> Labels: pull-request-available
> Attachments: testcluster-599f4d476b-bghw5_log.txt
>
>
> We discovered that in Application Mode, when the application has completed,
> the cluster is shutdown even if there are ongoing resource cleanup events
> happening in the background. For example, if ha cleanup fails, further
> retries are not attempted as the cluster is shut down before this can happen.
>
> We should also add a flag for the shutdown that will prevent further jobs
> from being submitted.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)