gyfora opened a new pull request, #871:
URL: https://github.com/apache/flink-kubernetes-operator/pull/871
## What is the purpose of the change
Rework the last-state upgrade mode to not be solely reliant on HA metadata
but to be flexible and use the job cancel mechanism in other cases. This change
also allows the session jobs to use last-state upgrade mode where HA metadata
is not accessible the same way as for Application clusters.
### Last state upgrades using cancel
Currently last-state upgrade mode relies purely on HA metadata that is
available for application deployments to simulate a failover during upgrade and
make the JM pick up the correct last state automatically. This has a couple
limitations, first and foremost is that it is not applicable to session jobs.
With this PR we introduce a new mechanism for last-state upgrades of
non-terminal jobs (the terminal case is already covered by existing mechanisms):
1. Cancel the job through rest API (async operation)
2. Wait until the job cancellation completes and the job becomes CANCELLED
(terminal state)
3. Observe last state information through REST API and use that for upgrade
(upgrade flow already there for terminal jobs)
This new mechanism is similar to what a human operator would do for these
jobs and does not rely on HA metadata and works for both application and
session jobs and also in cases where HA metadata is not usable otherwise such
as during version upgrades, or if HA is disabled etc.
### Changes to the reconciliation flow for correct cancellation during
upgrades
Currently the async nature of cancellation is not handled correctly in the
reconciler even though session jobs use this to cancel jobs which can lead to
in extreme cases 2 parallel jobs running on the same cluster.
To handle this, the reconciler now explicitly checks for cancelling state
and does not perform other upgrade actions until that completes. Also after
initiating an async cancel action through the REST API we immediately exit and
re-schedule the observation to wait until the cancellation completes and we can
observe the last state of the cluster.
The observer now recognises the CANCELLING state also as special user
initiated action and when the job becomes CANCELLED (or not found in case of
session jobs) it marks it explicitly SUSPENDED. This means that the reconciler
will always resumes it subsequently, eliminating a risk of ending up with a
cancelled job if the spec change was rolled back in the meantime.
### Refactored and improved FlinkService cancel methods
To eliminate duplicate logic and overall reduce complexity the cancel
application / session jobs methods have been refactored to re-use the common
parts. Also a significant portion of the logic has been removed by separating
the suspend and restore (upgrade) mechanism.
The `JobUpgrade` utility class now encapsulates the necessary suspend and
restore mechanism for the stateful upgrade depending on the current observed
state and also. This allows us to better handle cases of async cancellation
(SuspendMode.CANCEL) or if the job is already cancelled (or in terminal state)
do nothing (SuspendMode.NOOP) and simply perform the restore.
### Misc session job changes / fixes
In addition to making last-state upgrade mode generally available for
session jobs this PR includes several critical fixes to the core upgrade
cleanup logic as a result of this work such as:
- Improved cleanup method that correctly waits until the job is fully
cancelled instead of deleting the CR too early (risk of leaving the job there)
- Call observe during cancel for session jobs for correct behaviour
- Use correct job config generation for session jobs similar to
applications, such as retaining checkpoints during cancellation by default
which is needed for the above cancel mechanism
### Other changes / improvements as an outcome
- Remove last-state upgrade limitations for apps and use cancel in these
cases (flink version upgrade for non-running jobs, jobs without HA enabled)
## Verifying this change
- Existing unit and E2Es guard the current behaviour
- New unit tests have been added to cover the session job last-state
upgrades and the improved observe, reconcile, cleanup flow
- Extensive manual testing on local kubernetes
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changes to the `CustomResourceDescriptors`:
no
- Core observer or reconciler logic that is regularly executed: yes
## Documentation
- Does this pull request introduce a new feature? yes
- If yes, how is the feature documented? [TODO]
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]