Yanis Djeridi created FLINK-39243:
-------------------------------------
Summary: Include `observedGeneration` for Suspended Flink
Deployments
Key: FLINK-39243
URL: https://issues.apache.org/jira/browse/FLINK-39243
Project: Flink
Issue Type: Bug
Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.14.0
Reporter: Yanis Djeridi
Fix For: kubernetes-operator-1.14.0
h3. *Problem 1: FlinkDeployment — {{observedGeneration}} not updated when
suspended*
When a FlinkDeployment resource is created with {{{}spec.job.state:
suspended{}}}, the Flink Kubernetes Operator does not update the
{{status.observedGeneration}} field or other status fields. This violates
Kubernetes API conventions and breaks integration with standard deployment
tools like Kapp that rely on {{observedGeneration}} to determine when a
controller has processed a spec change, leading such tools to hang indefinitely.
h3. *Problem 2: FlinkBlueGreenDeployment — no {{observedGeneration}} field at
all*
The FlinkBlueGreenDeployment resource does not have an {{observedGeneration}}
field in its status at all, meaning deployment tools can never determine
whether the BlueGreen controller has processed a given spec generation,
regardless of state.
h3. *Root Cause*
{+}FlinkDeployment{+}:
In the reconciliation logic for FlinkDeployment, when the operator detects a
first deployment with spec.job.state: suspended, it returns early without
updating any status fields as seen here.
This results in: * status.observedGeneration is never set
* status.reconciliationStatus.lastReconciledSpec is never set
* status.lifecycleState remains empty instead of showing SUSPENDED
* isBeforeFirstDeployment() returns true on every reconciliation loop
{+}FlinkBlueGreenDeployment{+}:
FlinkBlueGreenDeploymentStatus does not have an observedGeneration field in its
status class. Additionally, when InitializingBlueStateHandler blocks on a
suspended initial state, it does not record lastReconciledSpec.
h3. *Expected Behaviour*
{+}FlinkDeployment{+}:
When a FlinkDeployment is created with spec.job.state: suspended, the operator
should acknowledge the spec without deploying any Flink resources (no JM pods,
no TM pods, no services). Specifically: * status.observedGeneration should be
set to match metadata.generation, signaling that the operator has processed the
spec
* status.reconciliationStatus.lastReconciledSpec should be recorded with
state: SUSPENDED
* status.lifecycleState should show SUSPENDED
* A subsequent change to spec.job.state: running should trigger a normal first
deployment
+FlinkBlueGreenDeployment:+
* FlinkBlueGreenDeploymentStatus should include an observedGeneration field,
set on every status update
* lastReconciledSpec should be recorded when blocking on a suspended initial
state
* A subsequent change to spec.job.state: running should trigger deployment
correctly
--
This message was sent by Atlassian Jira
(v8.20.10#820010)