StephanEwen commented on pull request #16655:
URL: https://github.com/apache/flink/pull/16655#issuecomment-899603149
## Suggested Follow-up Changes
### Names
- I was struggling a bit with the name `isFinishedOnRestore`, for example
in `TaskStateSnapshot.isFinishedOnRestore()`.
What does that mean? Will the task be deployed as "finished" on the next
restore? Was it already finished during the last restore?
Can we make this more self-explaining and call it
`taskWasDeployedAsFinished`? /cc @pnowojski
- `isOperatorsFinished()` as in `TaskStateSnapshot.isOperatorsFinished()`
sounds wrong to me, grammar wise. Given that the class is the "Task Snapshot",
should we call the method `isTaskFinised()`. Alternatively, we could call it
`areAllOperatorsFinished()`. I would suggest the first version, though.
- `updateNonFinishedOnRestoreOperatorState` sounds a bit clumsy. Can we
call this simply `updateOperatorState(...)` for being teh default case?
### Testing
All the tests about checking the illegal JobGraph modifications are written
as tests against the `CheckpointCoordinator`, when they could be written just
against the `VertexFinishedStateChecker`. That would make the tests more
targeted, like only against the actual component that has the logic. That way,
we need less test maintenance when the checkpoint coordinator changes later.
It is probably a good idea to have two test against the `Scheduler` that
validate that the modification tests happen at the right points. Something like
`testJobGraphModificationsAreCheckedForInitialSavepoint()` and
`testJobGraphModificationsAreCheckedForInitialCheckpoint()`.
Then we need no dedicated tests against the `CheckpointCoordinator`
regarding illegal job upgrades. That makes sense, because handling this is also
the responsibilities of the `Scheduler` and the `VertexFinishedStateChecker`.
The `CheckpointCoordinator` is only the component that connects the two, and
forwards the calls.
### Further separating the concerns of `PendingCheckpoint` and
`CheckpointPlan`
The PR makes some good changes into separating these two components better.
But I think we can go a bit further.
The guiding thought is: When dealing with a `PendingCheckpoint`, should one
be forced to also deal with a `CheckpointPlan` at all? I think not, the
`PendingCheckpoint` is purely a tracker of gathered states and how many tasks
have acknowledged and how many we are still waiting for. That makes testing,
reusability, etc. simple.
That means we should strive to have the `PendingCheckpoint` independent of
`CheckpointPlan`. I think we can achieve this with the following steps:
- Much of the reason to have the `CheckpointPlan` in `PendingCheckpoint`
is because the `CheckpointCoordinator` often requires access to the checkpoint
plan for a pending checkpoint. And it just uses the `PendingCheckpoint` as a
convenient way to store the `CheckpointPlan`.
A cleaner solution would be changing in `CheckpointCoordinator` the
`Map<Long, PendingCheckpoint> pendingCheckpoints` to a `Map<Long,
Tuple2<CheckpointPlan, PendingCheckpoint>> pendingCheckpoints`. The
`CheckpointCoordinator` would keep the two things it frequently needs together
in a tuple, rather than storing one in the other and thus expanding the
scope/concerns of `PendingCheckpoint`.
- The above change should allow us to reduce the interface of
`PendingCheckpoint.checkpointPlan` to
`PendingCheckpointFinishedTaskStateProvider`.
- The interface `PendingCheckpointFinishedTaskStateProvider` has some
methods that are about tracking state, not just about giving access to finished
state: `void reportTaskFinishedOnRestore(ExecutionVertex task)` and `void
reportTaskHasFinishedOperators(ExecutionVertex task);`.
I am wondering if those two would not be more cleanly handled outside the
`PendingCheckpoint` as well. Meaning in the method
`CheckpointCoordinator.receiveAcknowledgeMessage()` we handle the reporting of
finished operators to the `CheckpointPlan`. The method
`PendingCheckpoint.acknowledgeTask()` only deals with the states.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]