XComp opened a new pull request #18919:
URL: https://github.com/apache/flink/pull/18919
## What is the purpose of the change
This inconsistency was introduced by `c3a6b51` as part of changes done for
FLINK-19543. It should be, anyway, fixed for consistency reasons.
This issue never appeared because of the following reasons:
* getAllAndLock is only called by the CompletedCheckpoint
recovery which happens during the recovery of a job (i.e.
after failover)
* the removal happens through releaseAndTryRemove which can
be called in the following places:
* through DefaultCompletedCheckpointStore.addCheckpointAndSubsumeOldestOne
That happens after a job is already recovered and running
* through DefaultCompletedCheckpointStore.shutdown
...where it's only called if the job reached a
globally-terminal state (i.e. it's not subject to
job recovery)
* through DefaultJobGraphStore.globalCleanupAsync
...which is also only called on jobs that reached a
globally-terminal state (i.e. it's not subject to
job recovery)
* ZooKeeperStateHandleStore.releaseAndTryRemoveAll
...which seems to be legacy code which is not used
anywhere in production, anymore. I'm leaving it here
because it might make sense to remove ZooKeeper
completely, anyway.
## Brief change log
* Refactored `getAllAndLock` to make it possible to test concurrent node
changes
* Fixes the catch statement replacing `KeeperException.NoNodeException` by
`NotExistException` which was introduced in FLINK-19543
## Verifying this change
* A test was added to verify the change
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: no
- The serializers: no
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes
- The S3 file system connector: no
## Documentation
- Does this pull request introduce a new feature? no
- If yes, how is the feature documented? not applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]