Yanis Djeridi created FLINK-39243:
-------------------------------------

             Summary: Include `observedGeneration` for Suspended Flink 
Deployments
                 Key: FLINK-39243
                 URL: https://issues.apache.org/jira/browse/FLINK-39243
             Project: Flink
          Issue Type: Bug
          Components: Kubernetes Operator
    Affects Versions: kubernetes-operator-1.14.0
            Reporter: Yanis Djeridi
             Fix For: kubernetes-operator-1.14.0


h3. *Problem 1: FlinkDeployment — {{observedGeneration}} not updated when 
suspended*

When a FlinkDeployment resource is created with {{{}spec.job.state: 
suspended{}}}, the Flink Kubernetes Operator does not update the 
{{status.observedGeneration}} field or other status fields. This violates 
Kubernetes API conventions and breaks integration with standard deployment 
tools like Kapp that rely on {{observedGeneration}} to determine when a 
controller has processed a spec change, leading such tools to hang indefinitely.
h3. *Problem 2: FlinkBlueGreenDeployment — no {{observedGeneration}} field at 
all*

The FlinkBlueGreenDeployment resource does not have an {{observedGeneration}} 
field in its status at all, meaning deployment tools can never determine 
whether the BlueGreen controller has processed a given spec generation, 
regardless of state.
h3. *Root Cause*

{+}FlinkDeployment{+}:

In the reconciliation logic for FlinkDeployment, when the operator detects a 
first deployment with spec.job.state: suspended, it returns early without 
updating any status fields as seen here.

This results in: * status.observedGeneration is never set
 * status.reconciliationStatus.lastReconciledSpec is never set
 * status.lifecycleState remains empty instead of showing SUSPENDED
 * isBeforeFirstDeployment() returns true on every reconciliation loop

 
{+}FlinkBlueGreenDeployment{+}:

FlinkBlueGreenDeploymentStatus does not have an observedGeneration field in its 
status class. Additionally, when InitializingBlueStateHandler blocks on a 
suspended initial state, it does not record lastReconciledSpec.
 
h3. *Expected Behaviour*
 
{+}FlinkDeployment{+}:

When a FlinkDeployment is created with spec.job.state: suspended, the operator 
should acknowledge the spec without deploying any Flink resources (no JM pods, 
no TM pods, no services). Specifically: * status.observedGeneration should be 
set to match metadata.generation, signaling that the operator has processed the 
spec
 * status.reconciliationStatus.lastReconciledSpec should be recorded with 
state: SUSPENDED
 * status.lifecycleState should show SUSPENDED

 * A subsequent change to spec.job.state: running should trigger a normal first 
deployment

 
+FlinkBlueGreenDeployment:+
  * FlinkBlueGreenDeploymentStatus should include an observedGeneration field, 
set on every status update
 * lastReconciledSpec should be recorded when blocking on a suspended initial 
state

 * A subsequent change to spec.job.state: running should trigger deployment 
correctly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to