[jira] [Commented] (FLINK-27274) Job cannot be recovered, after restarting cluster

macdoor615 (Jira) Mon, 18 Apr 2022 00:37:05 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-27274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523580#comment-17523580
 ]


macdoor615 commented on FLINK-27274:
------------------------------------

[~zhuzh]

my procedure, for testing purpose, there is only one job on this cluster
 # start cluster at 2022-04-17 14:46:00,913
log file : flink-gum-standalonesession-0-hb3-dev-flink-000.log.3

{code:java}
start-cluster.sh {code}

 # load the job, 130f884ce0fa8a34e95317afa0a1d05c

{code:java}
2022-04-17 14:46:38,814 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job 专线告警处理: 
格式化原始告警 (130f884ce0fa8a34e95317afa0a1d05c) switched from state CREATED to 
RUNNING.{code}


{code:java}
sql-client.sh -f new_cf_alarm_no_recover.yaml.sql{code}

 # stop cluster

{code:java}
2022-04-17 14:48:32,729 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Closing 
TaskExecutor connection hb3-dev-flink-000:45719-f8e886 because: The 
TaskExecutor is shutting down.{code}
{code:java}
stop-cluster.sh{code}

 # restart cluster at 2022-04-17 14:48:53,390
log file: flink-gum-standalonesession-0-hb3-dev-flink-000.log
{code:java}
start-cluster.sh{code}

> Job cannot be recovered, after restarting cluster
> -------------------------------------------------
>
>                 Key: FLINK-27274
>                 URL: https://issues.apache.org/jira/browse/FLINK-27274
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / API
>    Affects Versions: 1.15.0
>         Environment: Flink 1.15.0-rc3
> [https://github.com/apache/flink/archive/refs/tags/release-1.15.0-rc3.tar.gz] 
>            Reporter: macdoor615
>            Priority: Blocker
>             Fix For: 1.15.1
>
>         Attachments: flink-conf.yaml, 
> flink-gum-standalonesession-0-hb3-dev-flink-000.log.3.zip, 
> flink-gum-standalonesession-0-hb3-dev-flink-000.log.zip, 
> flink-gum-taskexecutor-2-hb3-dev-flink-000.log, 
> new_cf_alarm_no_recover.yaml.sql
>
>
> 1. execute new_cf_alarm_no_recover.yaml.sql with sql-client.sh
> config file: flink-conf.yaml
> the job run properly
> 2. restart cluster with command
> stop-cluster.sh
> start-cluster.sh
> 3. job cannot be recovered
> log files
> flink-gum-standalonesession-0-hb3-dev-flink-000.log
> flink-gum-taskexecutor-2-hb3-dev-flink-000.log
> 4. not all job can not be recovered, some can, some can not, at same time
> 5. all job can be recovered on Flink 1.14.4



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-27274) Job cannot be recovered, after restarting cluster

Reply via email to