[
https://issues.apache.org/jira/browse/FLINK-25395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Piotr Nowojski updated FLINK-25395:
-----------------------------------
Description:
Extracting from [FLINK-25185
discussion|https://issues.apache.org/jira/browse/FLINK-25185?focusedCommentId=17462554&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17462554]
On checkpoint abortion or any failure in AsyncCheckpointRunnable,
it discards the state, in particular shared (incremental) state.
Since FLINK-24611, this creates a problem because shared state can be re-used
for future checkpoints.
Needs confirmation.
Likely symptom of this failure is a following exception during recovery:
{preformat}
Caused by: java.io.FileNotFoundException:
/tmp/junit3146957979516280339/junit1602669867129285236/d6a6dbdd-3fd7-4786-9dc1-9ccc161740da
(No such file or directory)
at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_292]
at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_292]
at java.io.FileInputStream.<init>(FileInputStream.java:138)
~[?:1.8.0_292]
at
org.apache.flink.core.fs.local.LocalDataInputStream.<init>(LocalDataInputStream.java:50)
~[flink-core-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
at
org.apache.flink.core.fs.local.LocalFileSystem.open(LocalFileSystem.java:134)
~[flink-core-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
at
org.apache.flink.core.fs.SafetyNetWrapperFileSystem.open(SafetyNetWrapperFileSystem.java:87)
~[flink-core-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
at
org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:68)
~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
at
org.apache.flink.changelog.fs.StateChangeFormat.read(StateChangeFormat.java:92)
~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
at
org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.advance(StateChangelogHandleStreamHandleReader.java:85)
~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
{preformat}
was:
Extracting from [FLINK-25185
discussion|https://issues.apache.org/jira/browse/FLINK-25185?focusedCommentId=17462554&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17462554]
On checkpoint abortion or any failure in AsyncCheckpointRunnable,
it discards the state, in particular shared (incremental) state.
Since FLINK-24611, this creates a problem because shared state can be re-used
for future checkpoints.
Needs confirmation.
> Incremental shared state might be discarded by TM
> -------------------------------------------------
>
> Key: FLINK-25395
> URL: https://issues.apache.org/jira/browse/FLINK-25395
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing, Runtime / State Backends
> Affects Versions: 1.15.0
> Reporter: Roman Khachatryan
> Priority: Critical
> Fix For: 1.15.0
>
>
> Extracting from [FLINK-25185
> discussion|https://issues.apache.org/jira/browse/FLINK-25185?focusedCommentId=17462554&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17462554]
> On checkpoint abortion or any failure in AsyncCheckpointRunnable,
> it discards the state, in particular shared (incremental) state.
> Since FLINK-24611, this creates a problem because shared state can be re-used
> for future checkpoints.
> Needs confirmation.
> Likely symptom of this failure is a following exception during recovery:
> {preformat}
> Caused by: java.io.FileNotFoundException:
> /tmp/junit3146957979516280339/junit1602669867129285236/d6a6dbdd-3fd7-4786-9dc1-9ccc161740da
> (No such file or directory)
> at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_292]
> at java.io.FileInputStream.open(FileInputStream.java:195)
> ~[?:1.8.0_292]
> at java.io.FileInputStream.<init>(FileInputStream.java:138)
> ~[?:1.8.0_292]
> at
> org.apache.flink.core.fs.local.LocalDataInputStream.<init>(LocalDataInputStream.java:50)
> ~[flink-core-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
> at
> org.apache.flink.core.fs.local.LocalFileSystem.open(LocalFileSystem.java:134)
> ~[flink-core-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
> at
> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.open(SafetyNetWrapperFileSystem.java:87)
> ~[flink-core-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
> at
> org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:68)
> ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
> at
> org.apache.flink.changelog.fs.StateChangeFormat.read(StateChangeFormat.java:92)
> ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
> at
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.advance(StateChangelogHandleStreamHandleReader.java:85)
> ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
> {preformat}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)