[ https://issues.apache.org/jira/browse/FLINK-38215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rui Fan reassigned FLINK-38215: ------------------------------- Assignee: junzhong qin > Remove HA data according to config option `job.savepoint-on-deletion` when > cleanup terminal JM after ttl passed > ---------------------------------------------------------------------------------------------------------------- > > Key: FLINK-38215 > URL: https://issues.apache.org/jira/browse/FLINK-38215 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator > Reporter: junzhong qin > Assignee: junzhong qin > Priority: Not a Priority > > When a bounded job FINISHED, the HA data is still kept after the JM > deployment deleted. As a result the job is kept in the reconcile loop. The > following is a log snippet from the operator for a FINISHED bounded stream > job: > {code:java} > //代码占位符 > 2025-08-07 20:51:47,786 INFO > org.apache.flink.kubernetes.operator.reconciler.deployment.ApplicationReconciler > [] [bounded-job-test] - Removing JobManager deployment for terminal > application. > 2025-08-07 20:51:47,786 INFO > org.apache.flink.kubernetes.operator.service.AbstractFlinkService [] > [bounded-job-test] - Deleting cluster with Foreground propagation > 2025-08-07 20:51:47,787 INFO > org.apache.flink.kubernetes.operator.service.AbstractFlinkService [] > [bounded-job-test] - Scaling JobManager Deployment to zero with 60 seconds > timeout... > 2025-08-07 20:51:49,124 INFO > org.apache.flink.kubernetes.operator.service.AbstractFlinkService [] > [bounded-job-test] - Completed Scaling JobManager Deployment to zero > 2025-08-07 20:51:49,125 INFO > org.apache.flink.kubernetes.operator.service.AbstractFlinkService [] > [bounded-job-test] - Deleting JobManager Deployment with 298 seconds > timeout... > 2025-08-07 20:51:51,148 INFO > org.apache.flink.kubernetes.operator.service.AbstractFlinkService [] > [bounded-job-test] - Completed Deleting JobManager Deployment > 2025-08-07 20:51:51,151 INFO > org.apache.flink.kubernetes.operator.service.AbstractFlinkService [] > [bounded-job-test] - Keeping HA metadata for last-state restore > 2025-08-07 20:51:51,290 INFO > org.apache.flink.kubernetes.operator.observer.deployment.ApplicationObserver > [] [bounded-job-test] - Observing JobManager deployment. Previous status: > MISSING > 2025-08-07 20:51:51,296 INFO > org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler > [] [bounded-job-test] - Resource fully reconciled, nothing to do... > 2025-08-07 20:52:06,315 INFO > org.apache.flink.kubernetes.operator.observer.deployment.ApplicationObserver > [] [bounded-job-test] - Observing JobManager deployment. Previous status: > MISSING > 2025-08-07 20:52:06,321 INFO > org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler > [] [bounded-job-test] - Resource fully reconciled, nothing to do...{code} > From the log we can see the HA metadata is kept for last-state restore which > is not needed actually and will be reconciled in the reconcile loop. > > IIUC, the HA metadata should be cleanup according to the config > `job.savepoint-on-deletion` or SuspendMode. > And I’d like to contribute a PR to fix this. > -- This message was sent by Atlassian Jira (v8.20.10#820010)