[ 
https://issues.apache.org/jira/browse/FLINK-10286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-10286.
---------------------------------
    Resolution: Duplicate

> Flink Persist Invalid Job Graph in Zookeeper
> --------------------------------------------
>
>                 Key: FLINK-10286
>                 URL: https://issues.apache.org/jira/browse/FLINK-10286
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.6.0
>            Reporter: Sayat Satybaldiyev
>            Priority: Major
>
> In HA mode Flink 1.6, Flink persist job graph in Zookpeer even if the job was 
> not accepted by Job Manager. This particularly bad as later if JM dies and 
> restarts JM tries to recover the job and obviously fails and dies completely.
>  
> How to reproduce:
> 1. Have HA Flink cluster 1.6
> 2. Submit invalid job, in my case I'm put invalid file schema for rocksdb 
> state backed
>  
>  
> {code:java}
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime);
> env.enableCheckpointing(5000);
> RocksDBStateBackend backend = new 
> RocksDBStateBackend("hddd:///tmp/flink/rocksdb");
> backend.setPredefinedOptions(PredefinedOptions.FLASH_SSD_OPTIMIZED);
> env.setStateBackend(backend);
> {code}
>  
> Client returns:
>  
>  
> {code:java}
> The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: Could not submit 
> job (JobID: 9680f02ae2f3806c3b4da25bfacd0749)
> {code}
>  
>  
> JM does not accept job, this truncated error log from JM:
>  
>  
> {code:java}
> Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to 
> submit job.
> ... 24 more
> Caused by: java.util.concurrent.CompletionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobExecutionException: Could not set up 
> JobManager
>  
> Caused by: java.lang.RuntimeException: Failed to start checkpoint ID counter: 
> Could not find a file system implementation for scheme 'hddd'. The scheme is 
> not directly supported by Flink and no Hadoop file system to support this 
> scheme could be loaded.
> {code}
>  
>  
>  
> 4. Go to ZK and observe that JM has saved job to ZK
> ls /flink/flink_ns/jobgraphs/9680f02ae2f3806c3b4da25bfacd0749
>  [7f392fd9-cedc-4978-9186-1f54b98eeeb7]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to