junzhong qin created FLINK-38215:
------------------------------------
Summary: Remove HA data according to config option
`job.savepoint-on-deletion` when cleanup terminal JM after ttl passed
Key: FLINK-38215
URL: https://issues.apache.org/jira/browse/FLINK-38215
Project: Flink
Issue Type: Bug
Components: Kubernetes Operator
Reporter: junzhong qin
When a bounded job FINISHED, the HA data is still kept after the JM deployment
deleted. As a result the job is kept in the reconcile loop. The following is a
log snippet from the operator for a FINISHED bounded stream job:
{code:java}
//代码占位符
2025-08-07 20:51:47,786 INFO
org.apache.flink.kubernetes.operator.reconciler.deployment.ApplicationReconciler
[] [bounded-job-test] - Removing JobManager deployment for terminal
application.2025-08-07 20:51:47,786 INFO
org.apache.flink.kubernetes.operator.service.AbstractFlinkService []
[bounded-job-test] - Deleting cluster with Foreground propagation2025-08-07
20:51:47,787 INFO
org.apache.flink.kubernetes.operator.service.AbstractFlinkService []
[bounded-job-test] - Scaling JobManager Deployment to zero with 60 seconds
timeout...2025-08-07 20:51:49,124 INFO
org.apache.flink.kubernetes.operator.service.AbstractFlinkService []
[bounded-job-test] - Completed Scaling JobManager Deployment to zero2025-08-07
20:51:49,125 INFO
org.apache.flink.kubernetes.operator.service.AbstractFlinkService []
[bounded-job-test] - Deleting JobManager Deployment with 298 seconds
timeout...2025-08-07 20:51:51,148 INFO
org.apache.flink.kubernetes.operator.service.AbstractFlinkService []
[bounded-job-test] - Completed Deleting JobManager Deployment2025-08-07
20:51:51,151 INFO
org.apache.flink.kubernetes.operator.service.AbstractFlinkService []
[bounded-job-test] - Keeping HA metadata for last-state restore2025-08-07
20:51:51,290 INFO
org.apache.flink.kubernetes.operator.observer.deployment.ApplicationObserver []
[bounded-job-test] - Observing JobManager deployment. Previous status:
MISSING2025-08-07 20:51:51,296 INFO
org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler
[] [bounded-job-test] - Resource fully reconciled, nothing to do...2025-08-07
20:52:06,315 INFO
org.apache.flink.kubernetes.operator.observer.deployment.ApplicationObserver []
[bounded-job-test] - Observing JobManager deployment. Previous status:
MISSING2025-08-07 20:52:06,321 INFO
org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler
[] [bounded-job-test] - Resource fully reconciled, nothing to do...{code}
>From the log we can see the HA metadata is kept for last-state restore which
>is not needed actually and will be reconciled in the reconcile loop.
IIUC, the HA metadata should be cleanup according to the config
`job.savepoint-on-deletion` or SuspendMode.
And I’d like to contribute a PR to fix this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)