[jira] [Commented] (FLINK-26772) Application Mode does not wait for job cleanup during shutdown

Matthias Pohl (Jira) Tue, 19 Apr 2022 06:13:05 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-26772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524296#comment-17524296
 ]


Matthias Pohl commented on FLINK-26772:
---------------------------------------

I tried to reproduce it with the standalone cluster but failed: The 
{{ClusterEntrypoint}} process kept running until I resolved the cleanup issue 
which is the expected behavior. The logs of the k8s run revealed a {{SIGTERM}} 
which might be k8s-specific:
{code:java}
2022-03-21 09:34:41,129 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager [] 
- Closing the slot manager.
2022-03-21 09:34:41,129 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager [] 
- Suspending the slot manager.
2022-03-21 09:34:41,133 DEBUG 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor         [] - The 
RpcEndpoint resourcemanager_0 terminated successfully.
2022-03-21 09:34:41,136 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - RECEIVED 
SIGNAL 15: SIGTERM. Shutting down as requested.
2022-03-21 09:34:41,136 DEBUG org.apache.flink.runtime.rpc.akka.SupervisorActor 
           [] - AkkaRpcActor akka://flink/user/rpc/resourcemanager_0 has 
terminated.
2022-03-21 09:34:41,151 INFO  org.apache.flink.runtime.blob.BlobServer          
           [] - Stopped BLOB server at 0.0.0.0:6124 {code}
[~yangwang166] do you have any guess what external process might have sent the 
SIGTERM while the Application Mode cluster is shutting down?

> Application Mode does not wait for job cleanup during shutdown
> --------------------------------------------------------------
>
>                 Key: FLINK-26772
>                 URL: https://issues.apache.org/jira/browse/FLINK-26772
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.15.0
>            Reporter: Mika Naylor
>            Assignee: Mika Naylor
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: testcluster-599f4d476b-bghw5_log.txt
>
>
> We discovered that in Application Mode, when the application has completed, 
> the cluster is shutdown even if there are ongoing resource cleanup events 
> happening in the background. For example, if ha cleanup fails, further 
> retries are not attempted as the cluster is shut down before this can happen.
>  
> We should also add a flag for the shutdown that will prevent further jobs 
> from being submitted.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-26772) Application Mode does not wait for job cleanup during shutdown

Reply via email to