[
https://issues.apache.org/jira/browse/FLINK-34495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820659#comment-17820659
]
Zakelly Lan commented on FLINK-34495:
-------------------------------------
[~mapohl] It is not convenient to avoid this, since the checkpoint notification
is best-effort and there is no ack from TM to JM. JM does not know when it's
'safe' to delete the private state directory. But yes, it should be addressed
and the notification or the state file ownership should better be re-designed.
I suggest a dedicated test to reproduce this since this may happen rarely in
this test I guess.
> Resuming Savepoint (rocks, scale up, heap timers) end-to-end test failure due
> to FileNotFoundException
> ------------------------------------------------------------------------------------------------------
>
> Key: FLINK-34495
> URL: https://issues.apache.org/jira/browse/FLINK-34495
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.20.0
> Reporter: Matthias Pohl
> Assignee: Zakelly Lan
> Priority: Major
> Labels: pull-request-available, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57760&view=logs&j=e9d3d34f-3d15-59f4-0e3e-35067d100dfe&t=5d91035e-8022-55f2-2d4f-ab121508bf7e&l=2010
> {code}
> java.util.concurrent.ExecutionException: java.io.IOException: Could not open
> output stream for state backend
> at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?]
> at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?]
> at
> org.apache.flink.util.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:511)
> ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
> at
> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:54)
> ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
> at
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191)
> ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
> at
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124)
> [flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> [?:?]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> [?:?]
> at java.lang.Thread.run(Thread.java:829) [?:?]
> Caused by: java.io.IOException: Could not open output stream for state backend
> at
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:461)
> ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
> at
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.flushToFile(FsCheckpointStreamFactory.java:308)
> ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
> at
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.write(FsCheckpointStreamFactory.java:284)
> ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
> at
> org.apache.flink.contrib.streaming.state.RocksDBStateUploader.uploadLocalFileToCheckpointFs(RocksDBStateUploader.java:148)
> ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
> at
> org.apache.flink.contrib.streaming.state.RocksDBStateUploader.lambda$createUploadFutures$0(RocksDBStateUploader.java:111)
> ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
> at
> org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:32)
> ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
> ~[?:?]
> ... 3 more
> Caused by: java.io.FileNotFoundException:
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-32462964103/savepoint-e2e-test-chckpt-dir/3c9ffc670ead2cb3c4118410cbef3b72/chk-12/3415a2f2-b0c8-4a07-b4a1-bb6cc58a7c56
> (No such file or directory)
> at java.io.FileOutputStream.open0(Native Method) ~[?:?]
> at java.io.FileOutputStream.open(FileOutputStream.java:298) ~[?:?]
> at java.io.FileOutputStream.<init>(FileOutputStream.java:237) ~[?:?]
> at java.io.FileOutputStream.<init>(FileOutputStream.java:187) ~[?:?]
> at
> org.apache.flink.core.fs.local.LocalDataOutputStream.<init>(LocalDataOutputStream.java:50)
> ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
> at
> org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:266)
> ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
> at
> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.create(SafetyNetWrapperFileSystem.java:130)
> ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
> at
> org.apache.flink.core.fs.EntropyInjector.createEntropyAware(EntropyInjector.java:76)
> ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
> at
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:451)
> ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
> at
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.flushToFile(FsCheckpointStreamFactory.java:308)
> ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
> at
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.write(FsCheckpointStreamFactory.java:284)
> ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
> at
> org.apache.flink.contrib.streaming.state.RocksDBStateUploader.uploadLocalFileToCheckpointFs(RocksDBStateUploader.java:148)
> ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
> at
> org.apache.flink.contrib.streaming.state.RocksDBStateUploader.lambda$createUploadFutures$0(RocksDBStateUploader.java:111)
> ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
> at
> org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:32)
> ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
> ~[?:?]
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)