[
https://issues.apache.org/jira/browse/FLINK-38215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18041801#comment-18041801
]
junzhong qin commented on FLINK-38215:
--------------------------------------
Hi [~_anton] , in the issue "Zookeeper HA support"
https://issues.apache.org/jira/browse/FLINK-27273 , seems the HA data will
never deleted when execute `cleanupTerminalJmAfterTtl` after ttlPassed.
{code:java}
//java
// ApplicationReconciler#cleanupTerminalJmAfterTtl()
if (ttlPassed) {
LOG.info("Removing JobManager deployment for terminal application.");
flinkService.deleteClusterDeployment(
deployment.getMetadata(), status, observeConfig, false);
return true;
}{code}
I want delete the HA data when a bounded stream job FINISHED, is it reasonable
from you side if i use suspendMode to delete HA data like this?
{code:java}
//java
// ApplicationReconciler#cleanupTerminalJmAfterTtl()
if (ttlPassed) {
LOG.info("Removing JobManager deployment for terminal application.");
var suspendMode =
observeConfig.getBoolean(KubernetesOperatorConfigOptions.SAVEPOINT_ON_DELETION)
? SuspendMode.SAVEPOINT
: SuspendMode.STATELESS;
flinkService.deleteClusterDeployment(
deployment.getMetadata(), status, observeConfig,
suspendMode.deleteHaMeta());
return true;
}{code}
> Remove HA data according to config option `job.savepoint-on-deletion` when
> cleanup terminal JM after ttl passed
> ----------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-38215
> URL: https://issues.apache.org/jira/browse/FLINK-38215
> Project: Flink
> Issue Type: Bug
> Components: Kubernetes Operator
> Reporter: junzhong qin
> Assignee: junzhong qin
> Priority: Not a Priority
>
> When a bounded job FINISHED, the HA data is still kept after the JM
> deployment deleted. As a result the job is kept in the reconcile loop. The
> following is a log snippet from the operator for a FINISHED bounded stream
> job:
> {code:java}
> //代码占位符
> 2025-08-07 20:51:47,786 INFO
> org.apache.flink.kubernetes.operator.reconciler.deployment.ApplicationReconciler
> [] [bounded-job-test] - Removing JobManager deployment for terminal
> application.
> 2025-08-07 20:51:47,786 INFO
> org.apache.flink.kubernetes.operator.service.AbstractFlinkService []
> [bounded-job-test] - Deleting cluster with Foreground propagation
> 2025-08-07 20:51:47,787 INFO
> org.apache.flink.kubernetes.operator.service.AbstractFlinkService []
> [bounded-job-test] - Scaling JobManager Deployment to zero with 60 seconds
> timeout...
> 2025-08-07 20:51:49,124 INFO
> org.apache.flink.kubernetes.operator.service.AbstractFlinkService []
> [bounded-job-test] - Completed Scaling JobManager Deployment to zero
> 2025-08-07 20:51:49,125 INFO
> org.apache.flink.kubernetes.operator.service.AbstractFlinkService []
> [bounded-job-test] - Deleting JobManager Deployment with 298 seconds
> timeout...
> 2025-08-07 20:51:51,148 INFO
> org.apache.flink.kubernetes.operator.service.AbstractFlinkService []
> [bounded-job-test] - Completed Deleting JobManager Deployment
> 2025-08-07 20:51:51,151 INFO
> org.apache.flink.kubernetes.operator.service.AbstractFlinkService []
> [bounded-job-test] - Keeping HA metadata for last-state restore
> 2025-08-07 20:51:51,290 INFO
> org.apache.flink.kubernetes.operator.observer.deployment.ApplicationObserver
> [] [bounded-job-test] - Observing JobManager deployment. Previous status:
> MISSING
> 2025-08-07 20:51:51,296 INFO
> org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler
> [] [bounded-job-test] - Resource fully reconciled, nothing to do...
> 2025-08-07 20:52:06,315 INFO
> org.apache.flink.kubernetes.operator.observer.deployment.ApplicationObserver
> [] [bounded-job-test] - Observing JobManager deployment. Previous status:
> MISSING
> 2025-08-07 20:52:06,321 INFO
> org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler
> [] [bounded-job-test] - Resource fully reconciled, nothing to do...{code}
> From the log we can see the HA metadata is kept for last-state restore which
> is not needed actually and will be reconciled in the reconcile loop.
>
> IIUC, the HA metadata should be cleanup according to the config
> `job.savepoint-on-deletion` or SuspendMode.
> And I’d like to contribute a PR to fix this.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)