asukawen opened a new pull request, #1145:
URL: https://github.com/apache/flink-kubernetes-operator/pull/1145

   ## What is the purpose of the change
   
   Kubernetes deployment deletion waits can fail or time out before the old 
JobManager deployment is fully removed. The operator currently logs these 
failures and continues reconciliation, which can submit a replacement cluster 
while the old deployment is still terminating and result in `AlreadyExists` 
errors.
   
   ## Brief change log
   
   - Propagate non-404 errors while waiting for Kubernetes resources to be 
deleted.
   - Retry reconciliation instead of creating a replacement cluster before 
deletion completes.
   - Preserve the best-effort JobManager shutdown behavior before mandatory 
deployment deletion.
   - Update deletion error and timeout tests.
   
   ## Verifying this change
   
   This change added and updated tests and was verified with:
   
   ```bash
   JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home \
     mvn -pl flink-kubernetes-operator -am \
     -DskipITs \
     -Dtest=AbstractFlinkServiceTest \
     -Dsurefire.failIfNoSpecifiedTests=false test
   ```
   
   Tests run: 40, failures: 0, errors: 0, skipped: 0.
   
   ## Does this pull request potentially affect one of the following parts:
   
   - Dependencies (does it add or upgrade a dependency): no
   - The public API, i.e., is any changes to the `CustomResourceDescriptors`: no
   - Core observer or reconciler logic that is regularly executed: yes
   
   ## Documentation
   
   - Does this pull request introduce a new feature? no
   - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to