[ 
https://issues.apache.org/jira/browse/FLINK-38517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora closed FLINK-38517.
------------------------------
    Fix Version/s: kubernetes-operator-1.14.0
         Assignee: Daniel Rossos
       Resolution: Fixed

merged to main c479a460b1d9f62d685d2d6208ec275fef8938ce

> Unable to Transition to Green for FlinkBlueGreenDeployment with "Ignored 
> Fields"
> --------------------------------------------------------------------------------
>
>                 Key: FLINK-38517
>                 URL: https://issues.apache.org/jira/browse/FLINK-38517
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>            Reporter: Daniel Rossos
>            Assignee: Daniel Rossos
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: kubernetes-operator-1.14.0
>
>
> Found an issue with the newly added BlueGreen phase 1 deployment in the Flink 
> Kuberentes operator. Merged 
> [PR|https://github.com/apache/flink-kubernetes-operator/pull/969]
> TL;DR is when we clear the "Ignored Fields" (using 
> ReflectiveDiffBuilder.clearIgnoredFields()) when doing a check between our 
> FlinkBlueGreen template's spec and the spec of the Green-FlinkDeployment, we 
> strip these fields, I think unintentionally,  after we do the equality check. 
> This will cause equality check to fail if only difference is these ignored 
> fields.
> Longer breakdown of the trace / process we found in this loop:
> Flow and exact areas where this issue becomes evident:
> 1. You create a FlinkBlueGreenDeployment with fields that would be cleared by 
> ReflectiveDiffBuilder.clearIgnoredFields(). In our example 
> `spec.template.spec.podTemplate.apiVersion` and 
> `spec.template.spec.podTemplate.kind`. After then having your 
> Blue-FlinkDeployment enter a stable state (which will then set the 
> `lastReconciledSpec` json-string to the proper FlinkBlueGreenDeployment spec 
> including the additionalProperties), trigger a Green-FlinkDeployment to spin 
> up by making a spec change to the FlinkBlueGreenDeployment
> 2. As part of this transition, 
> [`TransitioningStateHandler.handle`|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/bluegreen/handlers/TransitioningStateHandler.java#L38]
>  will then call 
> [`BlueGreenDeploymentService.handleSpecChangesDuringTransition`|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/bluegreen/BlueGreenDeploymentService.java#L400]
>  which will then attempt to see if there is any spec changes by calling 
> [`FlinkBlueGreenDeploymentSpecDiff.compare`|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/diff/FlinkBlueGreenDeploymentSpecDiff.java#L58].
> 3. `FlinkBlueGreenDeploymentSpecDiff.compare` will first check if the 
> FlinkBlueGreenDeployment spec from the CR is equal to the lastReconciledState 
> spec yaml which is stored as a field yaml-string and updated based on last 
> resolution attempt. This will return false because of the changes you just 
> made to the FlinkBlueGreenDeploymentSpec that triggered this Blue-Green 
> transition
> 4. This will then trigger a DiffResult to be generated using 
> `ReflectiveDiffBuilder`, which as part of the constructor calls 
> [`ReflectiveDiffBuilder.clearIgnoredFields()`|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/diff/ReflectiveDiffBuilder.java#L192]
>  which removes all addititionalProperties from both your left and right 
> specs. Note this strip happens _after_ the equality check.
> 5. The spec that is supposed to now represent the FlinkBlueGreenDeployment 
> spec will now be written to the lastReconciledSpec but now is critically 
> missing the additionalProperties field.
> 6. n-th loop triggers of monitoring transition after the 
> Green-FlinkDeployment is in a totally healthy running state, but the 
> `FlinkBlueGreenDeploymentSpecDiff.compare` will constantly be comparing the 
> actual FlinkBlueGreenDeployment spec which contains additionalProperties and 
> the lastReconciledSpec that has had them stripped.
> 7. This causes infinite loop / never reconciling state on transition and 
> breaks the entire FlinkBlueGreenDeployment flow 
> A fix / way to show this is the case is by moving the call to create the 
> ReflectiveDiffBuilder to before the line:
> ```
> boolean specsEqual = leftSpec.equals(rightSpec);
> ```
> Then everything works as expected because we update the specs to remove 
> ignored fields before checking if they are equal. I am interested to hear if 
> there are other / more time efficient alternative solutions
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to