[jira] [Updated] (FLINK-28843) Failed to restore from changelog checkpoint in claim mode

Lihe Ma (Jira) Fri, 05 Aug 2022 20:11:06 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-28843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Lihe Ma updated FLINK-28843:
----------------------------
    Description: 
# When native checkpoint is enabled and incremental checkpointing is enabled in 
rocksdb statebackend，if state data is greater than 
state.storage.fs.memory-threshold，it will be stored in a data file 
(FileStateHandle，RelativeFileStateHandle, etc) rather than stored with 
ByteStreamStateHandle in checkpoint metadata, like base-path1/chk-1/file1.
 # Then restore the job from base-path1/chk-1 in claim mode，using changelog 
statebackend，and the checkpoint path is set to base-path2, then new checkpoint 
will be saved in base-path2/chk-2, previous checkpoint file 
(base-path1/chk-1/file1) is needed.
 # Then restore the job from base-path2/chk-2 in changelog statebackend, flink 
will try to read base-path2/chk-2/file1, rather than the actual file location 
base-path1/chk-1/file1, which leads to FileNotFoundException and job failed.

 
How to reproduce?
 # Set state.storage.fs.memory-threshold to a small value, like '20b'.
 # {{run 
org.apache.flink.test.checkpointing.ChangelogPeriodicMaterializationSwitchStateBackendITCase#testSwitchFromDisablingToEnablingInClaimMode}}

  was:
# When native checkpoint is enabled and incremental checkpointing is enabled in 
rocksdb statebackend，if state data is greater than 
state.storage.fs.memory-threshold，it will be stored in a data file 
(FileStateHandle，RelativeFileStateHandle, etc) rather than stored with 
ByteStreamStateHandle in checkpoint metadata, like base-path1/chk-1/file1.
 # Then restore the job from base-path1/chk-1 in claim mode，using changelog 
statebackend，and the checkpoint path is set to base-path2, then new checkpoint 
will be saved in base-path2/chk-2, previous checkpoint file 
(base-path1/chk-1/file1) is needed.
 # Then restore the job from base-path2/chk-2 in changelog statebackend, flink 
will try to read base-path2/chk-2/file1, rather than the actual file location 
base-path1/chk-1/file1, which leads to FileNotFoundException and job failed.

 
How to reproduce? # Set state.storage.fs.memory-threshold to a small value, 
like '20b'.
 # {{run 
org.apache.flink.test.checkpointing.ChangelogPeriodicMaterializationSwitchStateBackendITCase#testSwitchFromDisablingToEnablingInClaimMode}}


> Failed to restore from changelog checkpoint in claim mode
> ---------------------------------------------------------
>
>                 Key: FLINK-28843
>                 URL: https://issues.apache.org/jira/browse/FLINK-28843
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / State Backends
>    Affects Versions: 1.15.0, 1.15.1
>            Reporter: Lihe Ma
>            Priority: Critical
>
> # When native checkpoint is enabled and incremental checkpointing is enabled 
> in rocksdb statebackend，if state data is greater than 
> state.storage.fs.memory-threshold，it will be stored in a data file 
> (FileStateHandle，RelativeFileStateHandle, etc) rather than stored with 
> ByteStreamStateHandle in checkpoint metadata, like base-path1/chk-1/file1.
>  # Then restore the job from base-path1/chk-1 in claim mode，using changelog 
> statebackend，and the checkpoint path is set to base-path2, then new 
> checkpoint will be saved in base-path2/chk-2, previous checkpoint file 
> (base-path1/chk-1/file1) is needed.
>  # Then restore the job from base-path2/chk-2 in changelog statebackend, 
> flink will try to read base-path2/chk-2/file1, rather than the actual file 
> location base-path1/chk-1/file1, which leads to FileNotFoundException and job 
> failed.
>  
> How to reproduce?
>  # Set state.storage.fs.memory-threshold to a small value, like '20b'.
>  # {{run 
> org.apache.flink.test.checkpointing.ChangelogPeriodicMaterializationSwitchStateBackendITCase#testSwitchFromDisablingToEnablingInClaimMode}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28843) Failed to restore from changelog checkpoint in claim mode

Reply via email to