[
https://issues.apache.org/jira/browse/FLINK-10286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till Rohrmann closed FLINK-10286.
---------------------------------
Resolution: Duplicate
> Flink Persist Invalid Job Graph in Zookeeper
> --------------------------------------------
>
> Key: FLINK-10286
> URL: https://issues.apache.org/jira/browse/FLINK-10286
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.6.0
> Reporter: Sayat Satybaldiyev
> Priority: Major
>
> In HA mode Flink 1.6, Flink persist job graph in Zookpeer even if the job was
> not accepted by Job Manager. This particularly bad as later if JM dies and
> restarts JM tries to recover the job and obviously fails and dies completely.
>
> How to reproduce:
> 1. Have HA Flink cluster 1.6
> 2. Submit invalid job, in my case I'm put invalid file schema for rocksdb
> state backed
>
>
> {code:java}
> StreamExecutionEnvironment env =
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime);
> env.enableCheckpointing(5000);
> RocksDBStateBackend backend = new
> RocksDBStateBackend("hddd:///tmp/flink/rocksdb");
> backend.setPredefinedOptions(PredefinedOptions.FLASH_SSD_OPTIMIZED);
> env.setStateBackend(backend);
> {code}
>
> Client returns:
>
>
> {code:java}
> The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: Could not submit
> job (JobID: 9680f02ae2f3806c3b4da25bfacd0749)
> {code}
>
>
> JM does not accept job, this truncated error log from JM:
>
>
> {code:java}
> Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to
> submit job.
> ... 24 more
> Caused by: java.util.concurrent.CompletionException:
> java.lang.RuntimeException:
> org.apache.flink.runtime.client.JobExecutionException: Could not set up
> JobManager
>
> Caused by: java.lang.RuntimeException: Failed to start checkpoint ID counter:
> Could not find a file system implementation for scheme 'hddd'. The scheme is
> not directly supported by Flink and no Hadoop file system to support this
> scheme could be loaded.
> {code}
>
>
>
> 4. Go to ZK and observe that JM has saved job to ZK
> ls /flink/flink_ns/jobgraphs/9680f02ae2f3806c3b4da25bfacd0749
> [7f392fd9-cedc-4978-9186-1f54b98eeeb7]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)