[
https://issues.apache.org/jira/browse/FLINK-20192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234272#comment-17234272
]
Congxian Qiu commented on FLINK-20192:
--------------------------------------
[~Antti-Kaikkonen] you can create a savepoint and restore from it, the
savepoint does not need to reference any checkpoint files(the checkpoint files
can be deleted if you don't need to restore from it), and after 1.11, the
savepoint can also be relocated(FLINK-5763).
> Externalized checkpoint references a checkpoint from a different job
> --------------------------------------------------------------------
>
> Key: FLINK-20192
> URL: https://issues.apache.org/jira/browse/FLINK-20192
> Project: Flink
> Issue Type: Bug
> Components: API / DataStream, Runtime / Checkpointing
> Affects Versions: 1.11.2
> Reporter: Antti Kaikkonen
> Priority: Major
> Attachments: _metadata
>
>
> When I try to restore from an externalized checkpoint located at:
> +/home/anttkaik/flink/checkpoints/0fc94de8d94e123585b5baed6972dbe8/chk-12+ I
> get the following error:
>
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
> at
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:204)
> at
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:247)
> at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:290)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:479)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:475)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:528)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) at
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) at
> java.lang.Thread.run(Thread.java:748) Caused by:
> org.apache.flink.util.FlinkException: Could not restore keyed state backend
> for FunctionGroupOperator_6b87a4870d0e21cecbbe271bd893cfcc_(2/4) from any of
> the 1 provided restore options. at
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
> at
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
> at
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:144)
> ... 9 more Caused by:
> org.apache.flink.runtime.state.BackendBuildingException: Caught unexpected
> exception. at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:329)
> at
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:535)
> at
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:301)
> at
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
> at
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
> ... 11 more Caused by: java.io.FileNotFoundException:
> /home/anttkaik/flink/checkpoints/01dbaf21d7c5e8f8eabd3602e086bb89/shared/0a3c0c1d-c924-4e6d-b6ad-463a75c9fce8
> (No such file or directory) at java.io.FileInputStream.open0(Native
> Method) at java.io.FileInputStream.open(FileInputStream.java:195) at
> java.io.FileInputStream.<init>(FileInputStream.java:138) at
> org.apache.flink.core.fs.local.LocalDataInputStream.<init>(LocalDataInputStream.java:50)
> at
> org.apache.flink.core.fs.local.LocalFileSystem.open(LocalFileSystem.java:143)
> at
> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.open(SafetyNetWrapperFileSystem.java:85)
> at
> org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:69)
> at
> org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.downloadDataForStateHandle(RocksDBStateDownloader.java:126)
> at
> org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.lambda$createDownloadRunnables$0(RocksDBStateDownloader.java:109)
> at
> org.apache.flink.util.function.ThrowingRunnable.lambda$unchecked$0(ThrowingRunnable.java:50)
> at
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
> at
> org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:211)
> at
> java.util.concurrent.CompletableFuture.asyncRunStage(CompletableFuture.java:1654)
> at
> java.util.concurrent.CompletableFuture.runAsync(CompletableFuture.java:1871)
> at
> org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.downloadDataForAllStateHandles(RocksDBStateDownloader.java:83)
> at
> org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.transferAllStateDataToDirectory(RocksDBStateDownloader.java:66)
> at
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.transferRemoteStateToLocalDirectory(RocksDBIncrementalRestoreOperation.java:230)
> at
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:195)
> at
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.initDBWithRescaling(RocksDBIncrementalRestoreOperation.java:342)
> at
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithRescaling(RocksDBIncrementalRestoreOperation.java:276)
> at
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:153)
> at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:270)
> ... 15 more{code}
> The job +0fc94de8d94e123585b5baed6972dbe8+ was restored from an externalized
> checkpoint generated by +01dbaf21d7c5e8f8eabd3602e086bb89+ and after the
> restoration was successful and +0fc94de8d94e123585b5baed6972dbe8+ had
> generated new externalized checkpoints I thought it was safe to delete the
> checkpoints from +01dbaf21d7c5e8f8eabd3602e086bb89+ but apparently I was
> wrong.
> I have attached the _metadata file from
> +/home/anttkaik/flink/checkpoints/0fc94de8d94e123585b5baed6972dbe8/chk-12+
> which contains the reference to
> +/home/anttkaik/flink/checkpoints/01dbaf21d7c5e8f8eabd3602e086bb89/shared/0a3c0c1d-c924-4e6d-b6ad-463a75c9fce8+
> which I have deleted.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)