[jira] [Commented] (FLINK-26908) HA job cannot to restarting

WangMinChao (Jira) Tue, 29 Mar 2022 01:21:05 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-26908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513913#comment-17513913
 ]


WangMinChao commented on FLINK-26908:
-------------------------------------

By my deep dig, i found out the 
org.apache.flink.runtime.dispatcher.Dispatcher#jobReachedTerminalState

method return value is CleanupJobState.GLOBAL, it will cause zookeeper HA data 
been cleanup.

> HA job cannot to restarting
> ---------------------------
>
>                 Key: FLINK-26908
>                 URL: https://issues.apache.org/jira/browse/FLINK-26908
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.13.3
>            Reporter: WangMinChao
>            Priority: Major
>         Attachments: jm.log
>
>
> We have running a job about the flinkcdc wrtiing to starrocks.
> At the first failure， this job can been restarting，and  successful create 
> archived file .
> {code:java}
> 2022-03-20 18:41:15,812 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job 
> mysql_2_sr_sr_cluster_1_qqm (971eb686ebd6af2f45f77ba97575443c) switched from 
> state RESTARTING to SUSPENDED.
> org.apache.flink.util.FlinkException: Scheduler is being stopped. ...
> ...
> 2022-03-20 18:41:16,139 INFO org.apache.flink.runtime.history.FsJobArchivist 
> [] - Job 971eb686ebd6af2f45f77ba97575443c has been archived at 
> cosn://bg-rt-flink-prod-1254213275/flink/completed-jobs/971eb686ebd6af2f45f77ba97575443c.
>  
> ...
> 2022-03-20 18:41:15,843 INFO  
> org.apache.flink.runtime.dispatcher.runner.JobDispatcherLeaderProcess [] - 
> Start JobDispatcherLeaderProcess.  {code}
>  
> On a subsequent failure，this job cannot to restarting，and not successful 
> create archived file 
> {code:java}
> 2022-03-22 16:18:44,991 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job 
> mysql_2_sr_sr_cluster_1_qqm (971eb686ebd6af2f45f77ba97575443c) switched from 
> state RUNNING to SUSPENDED.org.apache.flink.util.FlinkException: Scheduler is 
> being stopped.
> ...
> 2022-03-22 16:19:00,080 ERROR org.apache.flink.runtime.history.FsJobArchivist 
>              [] - Failed to archive 
> job.org.apache.hadoop.fs.FileAlreadyExistsException: File already exists: 
> cosn://bg-rt-flink-prod-1254213275/flink/completed-jobs/971eb686ebd6af2f45f77ba97575443c
>  
> ...
> 2022-03-22 16:19:00,919 INFO  
> org.apache.flink.runtime.dispatcher.runner.JobDispatcherLeaderProcess [] - 
> Stopping JobDispatcherLeaderProcess.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-26908) HA job cannot to restarting

Reply via email to