[
https://issues.apache.org/jira/browse/FLINK-25401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joyce.Li closed FLINK-25401.
----------------------------
Resolution: Not A Bug
> DefaultCompletedCheckpointStore may not return the latest CompletedCheckpoint
> after JM failover.
> ------------------------------------------------------------------------------------------------
>
> Key: FLINK-25401
> URL: https://issues.apache.org/jira/browse/FLINK-25401
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Reporter: Joyce.Li
> Priority: Major
>
> At present, when we recover {{{}DefaultCompletedCheckpointStore{}}}, we use
> the character order to sort the {{{}CompletedCheckpoint{}}}.
> {code:java}
> // Get all there is first.
> final List<Tuple2<RetrievableStateHandle<CompletedCheckpoint>, String>>
> initialCheckpoints =
> checkpointStateHandleStore.getAllAndLock();
> // Sort checkpoints by name.
> initialCheckpoints.sort(Comparator.comparing(o -> o.f1));{code}
> But considering this situation, for example, we reserve 3
> {{{}CompletedCheckpoint{}}}, their IDÂ are 99, 100, 101, after JM failover,
> DefaultCompletedCheckpointStore will restore these three
> {{{}CompletedCheckpoint{}}}, but the order will become 100, 101, 99 . When we
> restore the state of the job, we will use the {{CompletedCheckpoint}} with ID
> 99 to restore, which will cause an error.
> I think we should use {{CheckpointStoreUtil#nameToCheckpointID}} to convert
> the {{String}} to {{long}} before sorting.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)