Github user shixiaogang commented on the issue:
https://github.com/apache/flink/pull/3524
@StephanEwen Thanks very much for your valuable comments. The following
are some of my thoughts.
* Now the registration of shared states is put in `CheckpointCoordinator`
because it's needed whenever a `PendingCheckpoint` receives a state handle or a
`CompletedCheckpoint` is recovered. But I think it does make sense to put both
the registration and unregistration of shared states in the same place. I will
update the PR so that the logics are put in `PendingCheckpoint`s and
`CompletedCheckpoint`s.
* When a `SubtaskState` is not successfully added to the
`PendingCheckpoint`, the state objects in the `SubtaskState` should be
correctly deleted. The discarding of these `SubtaskState`s varies in different
cases. In the case where the `PendingCheckpoint` fails, the `SubtaskState`
should delete both its private states and shared states. But in the case where
the `CompletedCheckpoint` is subsumed, the `SubtaskState` should delete those
unreferenced shared states (possibly created by others) instead of its shared
states.
By registering the shared states first, we can unify the implementation
in the two cases. Those shared states in the failed `PendingCheckpoint` are
always not referenced by other checkpoints. So they can be correctly discarded
by the registry when the `PendingCheckpoint` unregisters its shared states,
just like a subsumed `CompletedCheckpoint` does.
Another choice is refactoring the interface of `CompositeStateHandle`.
Three methods, namely `onComplete()`, `onFail()` and `onSubsume()`, will be
provided. A`CompositeStateHandle` can implement these methods to correctly deal
with its states under these cases. What do you think?
* It's a good idea to introduce `SharedStateHandle` for shared states. It
can improve the performance and allow safety checks. I will add it in the
update.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---