[ 
https://issues.apache.org/jira/browse/FLINK-25401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joyce.Li closed FLINK-25401.
----------------------------
    Resolution: Not A Bug

> DefaultCompletedCheckpointStore may not return the latest CompletedCheckpoint 
> after JM failover.
> ------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-25401
>                 URL: https://issues.apache.org/jira/browse/FLINK-25401
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>            Reporter: Joyce.Li
>            Priority: Major
>
> At present, when we recover {{{}DefaultCompletedCheckpointStore{}}}, we use 
> the character order to sort the {{{}CompletedCheckpoint{}}}.
> {code:java}
> // Get all there is first.
> final List<Tuple2<RetrievableStateHandle<CompletedCheckpoint>, String>> 
> initialCheckpoints =
>         checkpointStateHandleStore.getAllAndLock();
> // Sort checkpoints by name.
> initialCheckpoints.sort(Comparator.comparing(o -> o.f1));{code}
> But considering this situation, for example, we reserve 3 
> {{{}CompletedCheckpoint{}}}, their ID  are 99, 100, 101, after JM failover, 
> DefaultCompletedCheckpointStore will restore these three 
> {{{}CompletedCheckpoint{}}}, but the order will become 100, 101, 99 . When we 
> restore the state of the job, we will use the {{CompletedCheckpoint}} with ID 
> 99 to restore, which will cause an error.
> I think we should use {{CheckpointStoreUtil#nameToCheckpointID}} to convert 
> the {{String}} to {{long}} before sorting.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to