gyfora opened a new pull request, #550:
URL: https://github.com/apache/flink-kubernetes-operator/pull/550

   ## What is the purpose of the change
   
   While the current health probe mechanism is able to detect different types 
of errors like startup / informer issues it can be generally beneficial to 
allow a simply canary mechanism that can verify that the operator recieives 
updates and reconciles resources in a timely manner.
   
   This PR introduces a simple canary mechanism that allows platform 
admins/users to deploy simple canary resources identified by a special label. 
The canary resource reconciliation logic then feeds into the Health probe 
through a configurable timeout which will be able to detect if the operator did 
not reconcile the resource.
   
   The canary resource won't deploy any Flink resources. 
   
   ## Brief change log
   
     - *Add canary resource logic and canary resource manager to handle 
timeouts*
     - *Wire canary logic into HealthProbe*
     - *Add unit tests*
   
   ## Verifying this change
   
   HealthProbe test has been extended with canary specific logic.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changes to the `CustomResourceDescriptors`: 
no
     - Core observer or reconciler logic that is regularly executed: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? yes
     - If yes, how is the feature documented?  [TODO]
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to