[ 
https://issues.apache.org/jira/browse/FLINK-34495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820659#comment-17820659
 ] 

Zakelly Lan commented on FLINK-34495:
-------------------------------------

[~mapohl] It is not convenient to avoid this, since the checkpoint notification 
is best-effort and there is no ack from TM to JM. JM does not know when it's 
'safe' to delete the private state directory. But yes, it should be addressed 
and the notification or the state file ownership should better be re-designed.

I suggest a dedicated test to reproduce this since this may happen rarely in 
this test I guess. 

> Resuming Savepoint (rocks, scale up, heap timers) end-to-end test failure due 
> to FileNotFoundException
> ------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-34495
>                 URL: https://issues.apache.org/jira/browse/FLINK-34495
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.20.0
>            Reporter: Matthias Pohl
>            Assignee: Zakelly Lan
>            Priority: Major
>              Labels: pull-request-available, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57760&view=logs&j=e9d3d34f-3d15-59f4-0e3e-35067d100dfe&t=5d91035e-8022-55f2-2d4f-ab121508bf7e&l=2010
> {code}
> java.util.concurrent.ExecutionException: java.io.IOException: Could not open 
> output stream for state backend
>         at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?]
>         at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?]
>         at 
> org.apache.flink.util.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:511)
>  ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
>         at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:54)
>  ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191)
>  ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124)
>  [flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>         at java.lang.Thread.run(Thread.java:829) [?:?]
> Caused by: java.io.IOException: Could not open output stream for state backend
>         at 
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:461)
>  ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
>         at 
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.flushToFile(FsCheckpointStreamFactory.java:308)
>  ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
>         at 
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.write(FsCheckpointStreamFactory.java:284)
>  ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
>         at 
> org.apache.flink.contrib.streaming.state.RocksDBStateUploader.uploadLocalFileToCheckpointFs(RocksDBStateUploader.java:148)
>  ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
>         at 
> org.apache.flink.contrib.streaming.state.RocksDBStateUploader.lambda$createUploadFutures$0(RocksDBStateUploader.java:111)
>  ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
>         at 
> org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:32)
>  ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>  ~[?:?]
>         ... 3 more
> Caused by: java.io.FileNotFoundException: 
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-32462964103/savepoint-e2e-test-chckpt-dir/3c9ffc670ead2cb3c4118410cbef3b72/chk-12/3415a2f2-b0c8-4a07-b4a1-bb6cc58a7c56
>  (No such file or directory)
>         at java.io.FileOutputStream.open0(Native Method) ~[?:?]
>         at java.io.FileOutputStream.open(FileOutputStream.java:298) ~[?:?]
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:237) ~[?:?]
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:187) ~[?:?]
>         at 
> org.apache.flink.core.fs.local.LocalDataOutputStream.<init>(LocalDataOutputStream.java:50)
>  ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
>         at 
> org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:266)
>  ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
>         at 
> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.create(SafetyNetWrapperFileSystem.java:130)
>  ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
>         at 
> org.apache.flink.core.fs.EntropyInjector.createEntropyAware(EntropyInjector.java:76)
>  ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
>         at 
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:451)
>  ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
>         at 
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.flushToFile(FsCheckpointStreamFactory.java:308)
>  ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
>         at 
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.write(FsCheckpointStreamFactory.java:284)
>  ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
>         at 
> org.apache.flink.contrib.streaming.state.RocksDBStateUploader.uploadLocalFileToCheckpointFs(RocksDBStateUploader.java:148)
>  ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
>         at 
> org.apache.flink.contrib.streaming.state.RocksDBStateUploader.lambda$createUploadFutures$0(RocksDBStateUploader.java:111)
>  ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
>         at 
> org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:32)
>  ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>  ~[?:?]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to