[ https://issues.apache.org/jira/browse/FLINK-26908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513913#comment-17513913 ]
WangMinChao commented on FLINK-26908: ------------------------------------- By my deep dig, i found out the org.apache.flink.runtime.dispatcher.Dispatcher#jobReachedTerminalState method return value is CleanupJobState.GLOBAL, it will cause zookeeper HA data been cleanup. > HA job cannot to restarting > --------------------------- > > Key: FLINK-26908 > URL: https://issues.apache.org/jira/browse/FLINK-26908 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.13.3 > Reporter: WangMinChao > Priority: Major > Attachments: jm.log > > > We have running a job about the flinkcdc wrtiing to starrocks. > At the first failure, this job can been restarting,and successful create > archived file . > {code:java} > 2022-03-20 18:41:15,812 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job > mysql_2_sr_sr_cluster_1_qqm (971eb686ebd6af2f45f77ba97575443c) switched from > state RESTARTING to SUSPENDED. > org.apache.flink.util.FlinkException: Scheduler is being stopped. ... > ... > 2022-03-20 18:41:16,139 INFO org.apache.flink.runtime.history.FsJobArchivist > [] - Job 971eb686ebd6af2f45f77ba97575443c has been archived at > cosn://bg-rt-flink-prod-1254213275/flink/completed-jobs/971eb686ebd6af2f45f77ba97575443c. > > ... > 2022-03-20 18:41:15,843 INFO > org.apache.flink.runtime.dispatcher.runner.JobDispatcherLeaderProcess [] - > Start JobDispatcherLeaderProcess. {code} > > On a subsequent failure,this job cannot to restarting,and not successful > create archived file > {code:java} > 2022-03-22 16:18:44,991 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job > mysql_2_sr_sr_cluster_1_qqm (971eb686ebd6af2f45f77ba97575443c) switched from > state RUNNING to SUSPENDED.org.apache.flink.util.FlinkException: Scheduler is > being stopped. > ... > 2022-03-22 16:19:00,080 ERROR org.apache.flink.runtime.history.FsJobArchivist > [] - Failed to archive > job.org.apache.hadoop.fs.FileAlreadyExistsException: File already exists: > cosn://bg-rt-flink-prod-1254213275/flink/completed-jobs/971eb686ebd6af2f45f77ba97575443c > > ... > 2022-03-22 16:19:00,919 INFO > org.apache.flink.runtime.dispatcher.runner.JobDispatcherLeaderProcess [] - > Stopping JobDispatcherLeaderProcess. > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)