[jira] [Comment Edited] (FLINK-25872) Restoring from non-changelog checkpoint with changelog state-backend enabled in CLAIM mode discards state in use

Yanfei Lei (Jira) Wed, 13 Apr 2022 03:35:07 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-25872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521607#comment-17521607
 ]


Yanfei Lei edited comment on FLINK-25872 at 4/13/22 10:34 AM:
--------------------------------------------------------------

hi [~roman], I wrote a 
document([https://docs.google.com/document/d/1KSFWc0gL7HkhC-JNrnsp06TLnsTmZOTHITQDcGMo0cI/edit?usp=sharing|https://docs.google.com/document/d/1KSFWc0gL7HkhC-JNrnsp06TLnsTmZOTHITQDcGMo0cI/edit?usp=sharing,])
 about this 
ticket[,|https://docs.google.com/document/d/1KSFWc0gL7HkhC-JNrnsp06TLnsTmZOTHITQDcGMo0cI/edit?usp=sharing,]
 would you please take a review and give some advice?  
{quote}I think registering *all* {{{}KeyedStateHandle{}}}s with the 
{{SharedStateRegistry}} on recovery in {{CLAIM}} mode would also solve the 
problem, wouldn't it?

The advantage is that JM wouldn't have to know anything about the changelog.
I think this is important and that's why I'd prefer such an approach.
{quote}
I think only `{{{}registering *all* \{{{}KeyedStateHandle{}}}s with the 
{{{}SharedStateRegistry{}}}{}}}` may not work as well, because the 
{{discardState()}} of KeyedStateHandles are {*}not empty{*}, although all 
{{KeyedStateHandles}} are registered to {{{}SharedStateRegistry{}}}, the state 
would be discarded on checkpoint subsuming(maybe I overlooked something)? 


was (Author: yanfei lei):
hi [~roman], I wrote a 
document([https://docs.google.com/document/d/1KSFWc0gL7HkhC-JNrnsp06TLnsTmZOTHITQDcGMo0cI/edit?usp=sharing|https://docs.google.com/document/d/1KSFWc0gL7HkhC-JNrnsp06TLnsTmZOTHITQDcGMo0cI/edit?usp=sharing,])
 about this 
ticket[,|https://docs.google.com/document/d/1KSFWc0gL7HkhC-JNrnsp06TLnsTmZOTHITQDcGMo0cI/edit?usp=sharing,]
 would you please take a review and give some advice?  

 
{quote}I think registering *all* {{{}KeyedStateHandle{}}}s with the 
{{SharedStateRegistry}} on recovery in {{CLAIM}} mode would also solve the 
problem, wouldn't it?

The advantage is that JM wouldn't have to know anything about the changelog.
I think this is important and that's why I'd prefer such an approach.
{quote}
I think only `{{{}registering *all* {{{}KeyedStateHandle{}}}s with the 
{{SharedStateRegistry}}{}}}` may not work as well, because the 
{{discardState()}} of KeyedStateHandles are {*}not empty{*}, although all 
{{KeyedStateHandles}} are registered to {{{}SharedStateRegistry{}}}, the state 
would be discarded on checkpoint subsuming(maybe I overlooked something)? 

> Restoring from non-changelog checkpoint with changelog state-backend enabled 
> in CLAIM mode discards state in use
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-25872
>                 URL: https://issues.apache.org/jira/browse/FLINK-25872
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing, Runtime / State Backends
>            Reporter: Yun Tang
>            Assignee: Yanfei Lei
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.16.0
>
>
> If we restore from checkpoint with changelog state-backend enabled in 
> snapshot CLAIM mode, the restored checkpoint would be discarded on subsume. 
> This invalidates newer/active checkpoints because their materialized part is 
> discarded (for incremental wrapped checkpoints, their private state is 
> discarded). This bug is like FLINK-25478.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (FLINK-25872) Restoring from non-changelog checkpoint with changelog state-backend enabled in CLAIM mode discards state in use

Reply via email to