villebro opened a new pull request, #32:
URL: https://github.com/apache/superset-kubernetes-operator/pull/32

   ## Summary
   
   Fix a fundamental race condition between the parent Superset controller and 
the SupersetLifecycleTask controller that caused:
   
   - Migrate and init tasks running in parallel (must be sequential)
   - Components deploying before lifecycle tasks complete
   - Spurious task pod recreation after completion
   
   ### Root causes
   
   1. **Split authority**: The parent used `CreateOrUpdate` on task CRs while 
the task controller independently managed status, causing `"object has been 
modified"` conflicts. When the task controller's status update failed, 
completed state was lost.
   
   2. **Premature pod deletion**: `applyRetentionPolicy` deleted the succeeded 
pod in the same reconcile that marked completion. If the subsequent status 
update conflicted, the pod was gone but state wasn't persisted — the next 
reconcile found no pod and created a new one.
   
   3. **Drain fired on every reconcile**: With `strategy: Always` + 
`upgradeStrategy: Drain`, the drain condition triggered on every reconcile (not 
just image changes), causing an infinite drain/recreate loop.
   
   ### Design changes
   
   - **Parent orchestrates, task controller executes**: The parent is now the 
sole authority on lifecycle sequencing. It uses a Get+Create/Delete pattern 
(never `CreateOrUpdate`) for task CRs, eliminating concurrent writes to the 
same object.
   
   - **Task controller simplified**: Removed `resetForConfigChange` — the task 
controller never autonomously resets tasks. Terminal states (Complete/Failed) 
return early unconditionally. The parent handles re-runs by deleting and 
recreating the CR.
   
   - **Retention deferred**: Pod cleanup only runs when completion state is 
already persisted (in a subsequent reconcile), preventing the lost-state race.
   
   - **Drain scoped to image changes**: Drain only fires when `imageChanged` is 
true, preventing infinite loops with `strategy: Always`.
   
   - **Pod-based drain verification**: `drainComponents` now verifies all 
component pods are terminated (not just Deployments deleted), closing the 
window where tasks could start while old pods were still running.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to