jykae commented on code in PR #68074:
URL: https://github.com/apache/airflow/pull/68074#discussion_r3387219415


##########
chart/templates/jobs/migrate-database-job.yaml:
##########
@@ -43,7 +43,7 @@ metadata:
     {{- end }}
   {{- $annotations := dict }}
   {{- if .Values.migrateDatabaseJob.useHelmHooks }}
-    {{- $_ := set $annotations "helm.sh/hook" "post-install,post-upgrade" }}
+    {{- $_ := set $annotations "helm.sh/hook" "pre-install,pre-upgrade" }}
     {{- $_ := set $annotations "helm.sh/hook-weight" "1" }}
     {{- $_ := set $annotations "helm.sh/hook-delete-policy" 
"before-hook-creation,hook-succeeded" }}
   {{- end }}

Review Comment:
   The ServiceAccount, Role, and RoleBinding are now hook resources 
(`helm.sh/hook: post-install,pre-upgrade`) with `hook-weight: "-5"`, while the 
Job runs at `hook-weight: "1"`. Helm applies lower-weighted hook resources 
first, so the SA and RBAC exist before the Job hook runs on both fresh installs 
and the first downgrade after adopting this version.
   
   ---
   Drafted-by: GitHub Copilot (Claude Opus 4.8) (no human review before posting)



##########
chart/files/db_migrate.py:
##########
@@ -0,0 +1,326 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Bidirectional Airflow metadata DB reconciliation for the helm chart.
+
+Decides at runtime whether the helm release wants a forward migrate, a
+downgrade, or a no-op, and runs the right command:
+
+* target == current  -> no-op (idempotent check)
+* target  > current  -> ``airflow db migrate`` inside this job's container
+  (uses the TARGET image, which ships forward scripts).
+* target  < current  -> ``airflow db downgrade --to-version <target>``
+  executed inside the still-running api-server pod (the OLD image still
+  ships the reverse scripts), followed by scaling every DB-touching
+  workload (api-server, scheduler, triggerer, dag-processor, worker) to
+  zero so that no OLD pod keeps talking to the now-downgraded schema. Helm
+  then patches those workloads back to ``replicas: N`` with the TARGET
+  image as the upgrade proceeds, so the cluster comes back up cleanly on
+  the target version. This means a downgrade trades the otherwise-broken
+  rolling-update window for a brief outage (which is unavoidable when the
+  schema goes backwards).
+
+Required env:
+
+* ``AIRFLOW_TARGET_VERSION`` - the version the chart is being 
upgraded/installed to.
+* ``POD_NAMESPACE`` - release namespace, injected via downward API.
+* ``RELEASE_NAME`` - the helm release name, used to scope the scale-down to
+  only the workloads owned by this release.
+
+Optional env:
+
+* ``MIGRATE_JOB_DRAIN_TIMEOUT_SECONDS`` - how long to wait for DB-touching
+  pods to terminate after scale-to-zero on a downgrade. Defaults to 300.
+  Increase when long-running worker tasks need more time to wind down.
+"""
+
+from __future__ import annotations
+
+import os
+import subprocess
+import sys
+import time
+
+from alembic.migration import MigrationContext
+from kubernetes import client, config as k8s_config
+from kubernetes.stream import stream
+from packaging.version import Version

Review Comment:
   Done. The `kubernetes` import is now lazy: it's wrapped in a module-level 
`try/except ImportError` guard, and only the downgrade branch calls 
`_require_kubernetes_client()`, which raises a clear `SystemExit` if the client 
is missing. The forward/no-op/fresh paths no longer require the `kubernetes` 
package.
   
   ---
   Drafted-by: GitHub Copilot (Claude Opus 4.8) (no human review before posting)



##########
chart/newsfragments/68074.significant.rst:
##########
@@ -0,0 +1,37 @@
+Helm chart now reconciles the Airflow metadata DB bidirectionally on ``helm 
upgrade``
+
+The ``migrate-database-job`` is now run as a ``post-install,pre-upgrade`` hook
+(was ``post-install,post-upgrade``) and its default args run a reconciliation
+script that picks one of four actions depending on the relationship between
+the chart's ``airflowVersion`` and the alembic head currently in the database:
+
+* ``fresh install`` (no alembic row, or DB unreachable) — ``airflow db 
migrate``.

Review Comment:
   Good catch. Updated the newsfragment to match the implementation: the action 
is only decided once the DB is reachable, and if it's still unreachable after 
the connection-wait window (`DB_CONNECT_MAX_WAIT_SECONDS`, default 120) the job 
exits non-zero and is retried via the Job `backoffLimit` rather than proceeding 
with a migrate.
   
   ---
   Drafted-by: GitHub Copilot (Claude Opus 4.8) (no human review before posting)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to