[
https://issues.apache.org/jira/browse/FLINK-28843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lihe Ma updated FLINK-28843:
----------------------------
Description:
# When native checkpoint is enabled and incremental checkpointing is enabled in
rocksdb statebackend,if state data is greater than
state.storage.fs.memory-threshold,it will be stored in a data file
(FileStateHandle,RelativeFileStateHandle, etc) rather than stored with
ByteStreamStateHandle in checkpoint metadata, like base-path1/chk-1/file1.
# Then restore the job from base-path1/chk-1 in claim mode,using changelog
statebackend,and the checkpoint path is set to base-path2, then new checkpoint
will be saved in base-path2/chk-2, previous checkpoint file
(base-path1/chk-1/file1) is needed.
# Then restore the job from base-path2/chk-2 in changelog statebackend, flink
will try to read base-path2/chk-2/file1, rather than the actual file location
base-path1/chk-1/file1, which leads to FileNotFoundException and job failed.
How to reproduce?
# Set state.storage.fs.memory-threshold to a small value, like '20b'.
# {{run
org.apache.flink.test.checkpointing.ChangelogPeriodicMaterializationSwitchStateBackendITCase#testSwitchFromDisablingToEnablingInClaimMode}}
was:
# When native checkpoint is enabled and incremental checkpointing is enabled in
rocksdb statebackend,if state data is greater than
state.storage.fs.memory-threshold,it will be stored in a data file
(FileStateHandle,RelativeFileStateHandle, etc) rather than stored with
ByteStreamStateHandle in checkpoint metadata, like base-path1/chk-1/file1.
# Then restore the job from base-path1/chk-1 in claim mode,using changelog
statebackend,and the checkpoint path is set to base-path2, then new checkpoint
will be saved in base-path2/chk-2, previous checkpoint file
(base-path1/chk-1/file1) is needed.
# Then restore the job from base-path2/chk-2 in changelog statebackend, flink
will try to read base-path2/chk-2/file1, rather than the actual file location
base-path1/chk-1/file1, which leads to FileNotFoundException and job failed.
How to reproduce? # Set state.storage.fs.memory-threshold to a small value,
like '20b'.
# {{run
org.apache.flink.test.checkpointing.ChangelogPeriodicMaterializationSwitchStateBackendITCase#testSwitchFromDisablingToEnablingInClaimMode}}
> Failed to restore from changelog checkpoint in claim mode
> ---------------------------------------------------------
>
> Key: FLINK-28843
> URL: https://issues.apache.org/jira/browse/FLINK-28843
> Project: Flink
> Issue Type: Bug
> Components: Runtime / State Backends
> Affects Versions: 1.15.0, 1.15.1
> Reporter: Lihe Ma
> Priority: Critical
>
> # When native checkpoint is enabled and incremental checkpointing is enabled
> in rocksdb statebackend,if state data is greater than
> state.storage.fs.memory-threshold,it will be stored in a data file
> (FileStateHandle,RelativeFileStateHandle, etc) rather than stored with
> ByteStreamStateHandle in checkpoint metadata, like base-path1/chk-1/file1.
> # Then restore the job from base-path1/chk-1 in claim mode,using changelog
> statebackend,and the checkpoint path is set to base-path2, then new
> checkpoint will be saved in base-path2/chk-2, previous checkpoint file
> (base-path1/chk-1/file1) is needed.
> # Then restore the job from base-path2/chk-2 in changelog statebackend,
> flink will try to read base-path2/chk-2/file1, rather than the actual file
> location base-path1/chk-1/file1, which leads to FileNotFoundException and job
> failed.
>
> How to reproduce?
> # Set state.storage.fs.memory-threshold to a small value, like '20b'.
> # {{run
> org.apache.flink.test.checkpointing.ChangelogPeriodicMaterializationSwitchStateBackendITCase#testSwitchFromDisablingToEnablingInClaimMode}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)