curcur commented on PR #19331:
URL: https://github.com/apache/flink/pull/19331#issuecomment-1091302016
I am mostly fine with the changes. Besides the comments left above, my main
concern is why introducing "RestoreMode DEFAULT = NO_CLAIM", let me explain my
concerns in the following:
1. I am fine with the changes along the path
"CheckpointCoordinator#restoreSavepoint", including all the interface changes,
e.t.c.
2. I am not fine with `RestoreMode DEFAULT = NO_CLAIM` and passed in
`EmbeddedCompletedCheckpointStore`, `StandaloneCompletedCheckpointStore`,
`CheckpointResourcesCleanupRunner`, `EmbeddedHaServicesWithLeadershipControl`.
There, it should follow the normal failover paths assuming all checkpoints
belong to the same job, and non-referenced are subsumed as normal. I do not
think NO_CLAIM follows this way.
So I would suggest we introduce a fourth internal value to indicate the
normal JM failover behavior
3. @Myasuka mentioned a good point that item 2 itself is not enough: it is
possible that before the first checkpoint after recovery is able to be made, JM
crashes again, so item 2 may lead to losing information about whether the
previous checkpoint is from a previous job or previous run.
We can also discuss offline.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]