Daniel Rossos created FLINK-38517:
-------------------------------------

             Summary: Unable to Transition to Green for 
FlinkBlueGreenDeployment with "Ignored Fields"
                 Key: FLINK-38517
                 URL: https://issues.apache.org/jira/browse/FLINK-38517
             Project: Flink
          Issue Type: Bug
          Components: Kubernetes Operator
            Reporter: Daniel Rossos


Found an issue with the newly added BlueGreen phase 1 deployment in the Flink 
Kuberentes operator. Merged 
[PR|https://github.com/apache/flink-kubernetes-operator/pull/969]

TL;DR is when we clear the "Ignored Fields" (using 
ReflectiveDiffBuilder.clearIgnoredFields()) when doing a check between our 
FlinkBlueGreen template's spec and the spec of the Green-FlinkDeployment, we 
strip these fields, I think unintentionally,  after we do the equality check. 
This will cause equality check to fail if only difference is these ignored 
fields.

Longer breakdown of the trace / process we found in this loop:

Flow and exact areas where this issue becomes evident:

1. You create a FlinkBlueGreenDeployment with fields that would be cleared by 
ReflectiveDiffBuilder.clearIgnoredFields(). In our example 
`spec.template.spec.podTemplate.apiVersion` and 
`spec.template.spec.podTemplate.kind`. After then having your 
Blue-FlinkDeployment enter a stable state (which will then set the 
`lastReconciledSpec` json-string to the proper FlinkBlueGreenDeployment spec 
including the additionalProperties), trigger a Green-FlinkDeployment to spin up 
by making a spec change to the FlinkBlueGreenDeployment
2. As part of this transition, 
[`TransitioningStateHandler.handle`|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/bluegreen/handlers/TransitioningStateHandler.java#L38]
 will then call 
[`BlueGreenDeploymentService.handleSpecChangesDuringTransition`|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/bluegreen/BlueGreenDeploymentService.java#L400]
 which will then attempt to see if there is any spec changes by calling 
[`FlinkBlueGreenDeploymentSpecDiff.compare`|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/diff/FlinkBlueGreenDeploymentSpecDiff.java#L58].
3. `FlinkBlueGreenDeploymentSpecDiff.compare` will first check if the 
FlinkBlueGreenDeployment spec from the CR is equal to the lastReconciledState 
spec yaml which is stored as a field yaml-string and updated based on last 
resolution attempt. This will return false because of the changes you just made 
to the FlinkBlueGreenDeploymentSpec that triggered this Blue-Green transition
4. This will then trigger a DiffResult to be generated using 
`ReflectiveDiffBuilder`, which as part of the constructor calls 
[`ReflectiveDiffBuilder.clearIgnoredFields()`|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/diff/ReflectiveDiffBuilder.java#L192]
 which removes all addititionalProperties from both your left and right specs. 
Note this strip happens _after_ the equality check.
5. The spec that is supposed to now represent the FlinkBlueGreenDeployment spec 
will now be written to the lastReconciledSpec but now is critically missing the 
additionalProperties field.
6. n-th loop triggers of monitoring transition after the Green-FlinkDeployment 
is in a totally healthy running state, but the 
`FlinkBlueGreenDeploymentSpecDiff.compare` will constantly be comparing the 
actual FlinkBlueGreenDeployment spec which contains additionalProperties and 
the lastReconciledSpec that has had them stripped.
7. This causes infinite loop / never reconciling state on transition and breaks 
the entire FlinkBlueGreenDeployment flow 

A fix / way to show this is the case is by moving the call to create the 
ReflectiveDiffBuilder to before the line:
```
boolean specsEqual = leftSpec.equals(rightSpec);
```
Then everything works as expected because we update the specs to remove ignored 
fields before checking if they are equal. I am interested to hear if there are 
other / more time efficient alternative solutions

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to