villebro opened a new pull request, #32: URL: https://github.com/apache/superset-kubernetes-operator/pull/32
## Summary Fix a fundamental race condition between the parent Superset controller and the SupersetLifecycleTask controller that caused: - Migrate and init tasks running in parallel (must be sequential) - Components deploying before lifecycle tasks complete - Spurious task pod recreation after completion ### Root causes 1. **Split authority**: The parent used `CreateOrUpdate` on task CRs while the task controller independently managed status, causing `"object has been modified"` conflicts. When the task controller's status update failed, completed state was lost. 2. **Premature pod deletion**: `applyRetentionPolicy` deleted the succeeded pod in the same reconcile that marked completion. If the subsequent status update conflicted, the pod was gone but state wasn't persisted — the next reconcile found no pod and created a new one. 3. **Drain fired on every reconcile**: With `strategy: Always` + `upgradeStrategy: Drain`, the drain condition triggered on every reconcile (not just image changes), causing an infinite drain/recreate loop. ### Design changes - **Parent orchestrates, task controller executes**: The parent is now the sole authority on lifecycle sequencing. It uses a Get+Create/Delete pattern (never `CreateOrUpdate`) for task CRs, eliminating concurrent writes to the same object. - **Task controller simplified**: Removed `resetForConfigChange` — the task controller never autonomously resets tasks. Terminal states (Complete/Failed) return early unconditionally. The parent handles re-runs by deleting and recreating the CR. - **Retention deferred**: Pod cleanup only runs when completion state is already persisted (in a subsequent reconcile), preventing the lost-state race. - **Drain scoped to image changes**: Drain only fires when `imageChanged` is true, preventing infinite loops with `strategy: Always`. - **Pod-based drain verification**: `drainComponents` now verifies all component pods are terminated (not just Deployments deleted), closing the window where tasks could start while old pods were still running. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
