[jira] [Commented] (FLINK-27274) Job cannot be recovered, after restarting cluster

Martijn Visser (Jira) Tue, 19 Apr 2022 00:16:09 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-27274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524132#comment-17524132
 ]


Martijn Visser commented on FLINK-27274:
----------------------------------------

I don't think this is a bug and it's definitely not a blocker. If you run a 
stop-cluster.sh, you're effectively stopping the cluster. If you want to have a 
stateful recovery, you'll need to follow the steps for a stateful recovery. You 
don't do that with stopping and starting the cluster. You would have to do 
something like:

1. Start your job
2. Create a savepoint
3. Stop your job (and your cluster, if needed)
4. (Start your cluster, if needed) and start your job

Keep in mind that for SQL jobs, there's currently no stateful upgrade path 
supported. This is being worked on (see FLIP-190). I'm leaning towards closing 
this ticket. 

> Job cannot be recovered, after restarting cluster
> -------------------------------------------------
>
>                 Key: FLINK-27274
>                 URL: https://issues.apache.org/jira/browse/FLINK-27274
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / API
>    Affects Versions: 1.15.0
>         Environment: Flink 1.15.0-rc3
> [https://github.com/apache/flink/archive/refs/tags/release-1.15.0-rc3.tar.gz] 
>            Reporter: macdoor615
>            Priority: Blocker
>             Fix For: 1.15.1
>
>         Attachments: flink-conf.yaml, 
> flink-gum-standalonesession-0-hb3-dev-flink-000.log.3.zip, 
> flink-gum-standalonesession-0-hb3-dev-flink-000.log.zip, 
> flink-gum-taskexecutor-2-hb3-dev-flink-000.log, log.recover.debug.zip, 
> new_cf_alarm_no_recover.yaml.sql
>
>
> 1. execute new_cf_alarm_no_recover.yaml.sql with sql-client.sh
> config file: flink-conf.yaml
> the job run properly
> 2. restart cluster with command
> stop-cluster.sh
> start-cluster.sh
> 3. job cannot be recovered
> log files
> flink-gum-standalonesession-0-hb3-dev-flink-000.log
> flink-gum-taskexecutor-2-hb3-dev-flink-000.log
> 4. not all job can not be recovered, some can, some can not, at same time
> 5. all job can be recovered on Flink 1.14.4



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-27274) Job cannot be recovered, after restarting cluster

Reply via email to