[jira] [Commented] (FLINK-29913) Shared state would be discarded by mistake when maxConcurrentCheckpoint>1

Congxian Qiu (Jira) Thu, 25 May 2023 00:38:04 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-29913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17726084#comment-17726084
 ]


Congxian Qiu commented on FLINK-29913:
--------------------------------------

thanks for the discuss above and contribution!

Using the UUID/filename as the key solves the problem here, and it also makes 
sense because the key and the remote file are one-to-one. In addition, it can 
also solve some other potential problems, for example, if the Flink job 
management platform uses the SharedRegistry here to maintain the checkpoints 
lifecycle, if a task has two ssts with the same name, it will now cause the 
file to be deleted by mistake (this situation occurs as follows: job A 
generates a checkpoint chk1, then stops, job B job B resumes from chk1, 
completes chk2, then stops, then job C resumes from chk1, completes chk3, after 
we register chk2 and chk3 in one SharedRegistry, we'll delete some remote files 
by mistake, because there will be some sst files in chk2 and chk3 with the same 
name)

> Shared state would be discarded by mistake when maxConcurrentCheckpoint>1
> -------------------------------------------------------------------------
>
>                 Key: FLINK-29913
>                 URL: https://issues.apache.org/jira/browse/FLINK-29913
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.15.0, 1.16.0, 1.17.0
>            Reporter: Yanfei Lei
>            Assignee: Feifan Wang
>            Priority: Major
>             Fix For: 1.16.3, 1.17.2
>
>
> When maxConcurrentCheckpoint>1, the shared state of Incremental rocksdb state 
> backend would be discarded by registering the same name handle. See 
> [https://github.com/apache/flink/pull/21050#discussion_r1011061072]
> cc [~roman] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-29913) Shared state would be discarded by mistake when maxConcurrentCheckpoint>1

Reply via email to