[GitHub] [flink-kubernetes-operator] gyfora opened a new pull request, #489: [FLINK-30406] Detect when jobmanager never started

GitBox Mon, 19 Dec 2022 13:43:16 -0800


gyfora opened a new pull request, #489:
URL: https://github.com/apache/flink-kubernetes-operator/pull/489


   ## What is the purpose of the change
   
   The purpose of this PR is to fix the long standing annoying case where the 
job was stuck after a non-upgradable state after starting/upgrading from a 
savepoint but the JobManager never starts.
   
   In these cases previously we only supported last-state (HA based) upgrade 
which was impossible to do if the JM never started and never created the HA 
metadata configmaps.
   
   The PR introduces a check whether the JobManager pods ever started by 
checking the Availability conditions on the JM deployment and comparing 
condition times with the deployment creation timestamp.
   
   If availability is False and the Deployment never transitioned out of this 
state after creation, we can then assume that the JM never started and we can 
perform the upgrade using the last recorded savepoint.
   
   This also removes the slightly adhoc logic we had in place for upgrades on 
initial deployments before stable state (that basically intended to work around 
this limitaiton).
   
   ## Verifying this change
   
   Unit tests + manual testing on minikube
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changes to the `CustomResourceDescriptors`: 
no
     - Core observer or reconciler logic that is regularly executed: yes
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-kubernetes-operator] gyfora opened a new pull request, #489: [FLINK-30406] Detect when jobmanager never started

Reply via email to