curcur commented on PR #19331:
URL: https://github.com/apache/flink/pull/19331#issuecomment-1091302016

   I am mostly fine with the changes. Besides the comments left above, my main 
concern is why introducing "RestoreMode DEFAULT = NO_CLAIM", let me explain my 
concerns in the following:
   
   1. I am fine with the changes along the path 
"CheckpointCoordinator#restoreSavepoint", including all the interface changes, 
e.t.c.
   2. I am not fine with `RestoreMode DEFAULT = NO_CLAIM` and passed in 
`EmbeddedCompletedCheckpointStore`, `StandaloneCompletedCheckpointStore`, 
`CheckpointResourcesCleanupRunner`, `EmbeddedHaServicesWithLeadershipControl`. 
There, it should follow the normal failover paths assuming all checkpoints 
belong to the same job, and non-referenced are subsumed as normal. I do not 
think NO_CLAIM follows this way.
   So I would suggest we introduce a fourth internal value to indicate the 
normal JM failover behavior
   3. @Myasuka mentioned a good point that item 2 itself is not enough: it is 
possible that before the first checkpoint after recovery is able to be made, JM 
crashes again, so item 2 may lead to losing information about whether the 
previous checkpoint is from a previous job or previous run.
   
   We can also discuss offline.
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to