1996fanrui commented on code in PR #1003: URL: https://github.com/apache/flink-kubernetes-operator/pull/1003#discussion_r2257077927
########## flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java: ########## @@ -560,6 +560,12 @@ public Optional<Savepoint> getLastCheckpoint(JobID jobId, Configuration conf) { && e.getMessage().contains("Checkpointing has not been enabled")) { LOG.warn("Checkpointing not enabled for job {}", jobId, e); return Optional.empty(); + } else if (e instanceof ExecutionException + && e.getMessage() != null + && e.getMessage() + .contains(String.format("Job %s not found", jobId.toString()))) { + LOG.warn("Job {} not found", jobId, e); Review Comment: > Background > - When a job is observed by the observer, the observer() method in AbstractFlinkResourceObserver is triggered > - Once the JM deployment is ready, AbstractFlinkDeploymentObserver#observeFlinkCluster() observes the Flink cluster > - SnapshotObserver#observeSavepointStatus() monitors the savepoint status > - SnapshotObserver#observeLatestCheckpoint() tracks the last checkpoint of jobs with a globally terminal state: > - A "Job not found" exception occurs when calling getLastCheckpoint() for a FINISHED job. > - In reality, the job can be retrieved via the GET /jobs/:jobid request, but the GET /jobs/:jobid/checkpoints request will throw a "Job not found" exception. When job is finished, the control loop will be stopped due to "Job not found" exception, it caused the clean up is not called, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org