Miretpl commented on code in PR #67012:
URL: https://github.com/apache/airflow/pull/67012#discussion_r3306949503
##########
dev/breeze/src/airflow_breeze/utils/selective_checks.py:
##########
@@ -246,6 +247,19 @@ def __hash__(self):
r"^airflow-core/src/airflow/kubernetes",
r"^airflow-core/tests/unit/kubernetes",
],
+ # Narrow trigger for the `breeze k8s smoke-test-overlay` CI job. Kept
+ # focused: only the overlay manifests, the per-overlay pytest dir,
+ # the prek hook that builds them, the breeze command that runs the
+ # smoke test, and the workflow file itself. `^chart/` (under
+ # HELM_FILES) is intentionally NOT reused — the overlays don't
+ # care about every chart-template edit, only their own files.
Review Comment:
```suggestion
#`^chart/` (under HELM_FILES) is intentionally NOT reused
# — the overlays don't care about every chart-template edit,
# only their own files.
```
##########
dev/breeze/src/airflow_breeze/utils/selective_checks.py:
##########
@@ -246,6 +247,19 @@ def __hash__(self):
r"^airflow-core/src/airflow/kubernetes",
r"^airflow-core/tests/unit/kubernetes",
],
+ # Narrow trigger for the `breeze k8s smoke-test-overlay` CI job. Kept
+ # focused: only the overlay manifests, the per-overlay pytest dir,
+ # the prek hook that builds them, the breeze command that runs the
+ # smoke test, and the workflow file itself. `^chart/` (under
+ # HELM_FILES) is intentionally NOT reused — the overlays don't
+ # care about every chart-template edit, only their own files.
+ FileGroupForCi.KUSTOMIZE_OVERLAYS_FILES: [
+ r"^chart/kustomize-overlays/",
+ r"^chart/tests/overlay_tests/",
Review Comment:
```suggestion
r"^chart/tests/overlay_tests/",
r"^chart/templates/",
r"^chart/files/",
```
As I believe that some customisations and their content can depend on how
templates are constructed and created.
##########
dev/breeze/src/airflow_breeze/commands/kubernetes_commands.py:
##########
@@ -2450,3 +2463,642 @@ def deploy_cluster(
)
if return_code != 0:
sys.exit(return_code)
+
+
+# ---------------------------------------------------------------------------
+# `breeze k8s smoke-test-overlay` — functional smoke test for a single
+# kustomize overlay under chart/kustomize-overlays/.
+#
+# Counterpart to the structural `build_kustomize_overlays` prek hook:
+# the prek hook validates that an overlay builds and that its STATUS.yaml
+# parses, while this command applies the overlay to a running kind
+# cluster, waits for every resource declared in the STATUS.yaml `verify:`
+# block, and optionally runs a per-overlay pytest module for behavioural
+# checks. An overlay's STATUS may only advance to `tested` once this
+# command exits 0.
+# ---------------------------------------------------------------------------
+
+KUSTOMIZE_OVERLAYS_PATH = CHART_PATH / "kustomize-overlays"
+# Behavioural overlay tests live under chart/tests/overlay_tests/ (NOT
+# kubernetes-tests/) so they sit next to the overlay manifests and the
+# rest of chart-adjacent pytest content. They are NOT discovered by
+# `breeze testing helm-tests --test-type all` because `overlay_tests` is
+# in chart/pyproject.toml's norecursedirs — only this command (which
+# invokes pytest by explicit path) sees them.
Review Comment:
```suggestion
# Behavioural overlay tests live under chart/tests/overlay_tests/.
# They are NOT discovered by `breeze testing helm-tests --test-type all`
# because `overlay_tests` is in chart/pyproject.toml's norecursedirs
# — only this command (which invokes pytest by explicit path) sees them.
```
Maybe it could be a bit shorter.
##########
dev/breeze/src/airflow_breeze/commands/kubernetes_commands.py:
##########
@@ -714,6 +716,19 @@ def _upload_k8s_image(python: str, kubernetes_version:
str, output: Output | Non
# CI runs from Docker Hub anonymous-pull rate limits, which intermittently
# turn the scheduled K8s test job red. Auto-bumped by
# scripts/ci/prek/upgrade_important_versions.py.
+#
+# Scope: ONLY images referenced by the regular K8s system tests under
+# kubernetes-tests/tests/kubernetes_tests/ (the suite `breeze k8s tests`
+# runs against the deployed chart). Images that appear in a kustomize
+# overlay under chart/kustomize-overlays/<name>/ must NOT be added here:
+# `breeze k8s smoke-test-overlay` auto-discovers them from the rendered
+# manifest via _discover_overlay_images() and preloads them with the same
+# pull-and-kind-load pattern, so adding a new overlay image is literally
+# "edit the overlay manifest, done" — no second list to maintain. If a
+# per-overlay pytest module needs to spawn an ad-hoc client pod, prefer
+# reusing an image already declared by the overlay (so it inherits the
+# auto-preload for free); add to this list only as a last resort and only
+# if the image is also useful to the non-overlay K8s tests.
Review Comment:
```suggestion
# Scope: ONLY images referenced by the regular K8S system tests under
# kubernetes-tests/tests/kubernetes_tests/ (the suite `breeze k8s tests`
# runs against the deployed chart). Images that appear in a kustomize
# overlay under chart/kustomize-overlays/<name>/ must NOT be added here:
# `breeze k8s smoke-test-overlay` auto-discovers them from the rendered
# manifest via _discover_overlay_images() and preloads them with the same
# pull-and-kind-load pattern. If a per-overlay pytest module needs to spawn
# an ad-hoc client pod, prefer reusing an image already declared by the
# overlay (inherits the auto-preload by default); add to this list only
# if the image is also useful to the non-overlay K8S tests.
```
I think the message can be a little shorter.
##########
dev/breeze/src/airflow_breeze/commands/kubernetes_commands.py:
##########
@@ -22,11 +22,12 @@
import shutil
Review Comment:
A lot of the code added here is written without any separation between
blocks of code, which at times requires a bit more caution when reading -
personally, I would appreciate more enters here 😄.
##########
chart/kustomize-overlays/kerberos/README.rst:
##########
@@ -0,0 +1,196 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+Kerberos Test KDC Overlay
+=========================
+
+This overlay stands up a throwaway in-cluster MIT Kerberos KDC, creates
+the ``airflow/airflow.<namespace>.svc.cluster.local`` service principal,
+and stores its keytab in a Secret named ``<release>-kerberos-keytab``.
+It is a standalone addition; no resource produced by the Helm chart is
+modified.
+
+It is intended as a proof-of-concept of how a non-Airflow component
+(in this case Kerberos infrastructure) can be expressed as a Kustomize
+overlay alongside the chart, rather than baked into the chart itself.
+The keytab Secret it produces is consumable as-is by the chart's
+existing kerberos sidecar (``kerberos.enabled=true``,
+``kerberos.keytab=/etc/airflow.keytab``,
+``extraSecrets.<release>-kerberos-keytab: {}``).
+
+.. warning::
+
+ The KDC pod uses a fixed admin password and stores its database in
+ an ``emptyDir``. Do not connect production workloads to it. Treat it
+ as a test fixture only.
+
+Prerequisites
+-------------
+
+* The Airflow chart installed in the same namespace (any executor).
+* ``kubectl`` access sufficient to apply Deployments, Services,
+ ConfigMaps, Secrets, ServiceAccounts/Roles/RoleBindings, and Jobs in
+ that namespace.
+
+Resources produced
+------------------
+
+* ``ConfigMap/<release>-krb5-conf`` - ``krb5.conf`` with the test realm
+ ``EXAMPLE.COM`` and the in-cluster KDC service as ``kdc``/``admin_server``.
+* ``Deployment/<release>-kerberos-kdc`` - single-replica MIT Kerberos
+ KDC + kadmind, image ``gcavalcante8808/krb5-server:latest``.
+* ``Service/<release>-kerberos-kdc`` - exposes 88 TCP+UDP and 749 TCP.
+* ``ServiceAccount`` + ``Role`` + ``RoleBinding`` named
+ ``<release>-kerberos-bootstrap`` - minimum permissions for the
+ bootstrap Job (pod exec + secret create/update in the same namespace).
+* ``Job/<release>-keytab-bootstrap`` - waits for the KDC to be Ready,
+ runs ``kadmin.local`` against it to create the principal and write
+ the keytab, then stores the keytab in:
+* ``Secret/<release>-kerberos-keytab`` (created by the Job) - holds
+ ``airflow.keytab`` under that key.
+
+Usage
+-----
+
+Reference this overlay from your own kustomization and substitute the
+release name and namespace. A minimal example:
+
+.. code-block:: yaml
+
+ # my-overlay/kustomization.yaml
+ apiVersion: kustomize.config.k8s.io/v1beta1
+ kind: Kustomization
+ namespace: airflow
+
+ resources:
+ -
github.com/apache/airflow/chart/kustomize-overlays/kerberos?ref=helm-chart/1.22.0
+
+Apply with:
+
+.. code-block:: bash
+
+ kubectl apply -k my-overlay/
+
+For a quick test, you can also just substitute the placeholders inline:
+
+.. code-block:: bash
+
+ kubectl kustomize chart/kustomize-overlays/kerberos | \
+ sed -e 's/RELEASE-NAME/airflow/g' -e 's/NAMESPACE/airflow/g' | \
+ kubectl apply -n airflow -f -
+
+This is exactly what ``breeze k8s smoke-test-overlay kerberos`` does
+during the local and CI smoke test.
+
+Wiring the keytab into the chart's sidecar
+------------------------------------------
+
+The chart's kerberos sidecar (``workers.kerberosSidecar``,
+``workers.celery.kerberosSidecar``) mounts a Secret named in
+``kerberos.keytab``. Point that at the Secret produced by this overlay:
+
+.. code-block:: yaml
+
+ # values.yaml fragment
+ kerberos:
+ enabled: true
+ ccacheMountPath: /var/kerberos-ccache
+ keytabPath: /etc/airflow.keytab
+ principal: airflow/[email protected]
+
+ workers:
+ kerberosSidecar:
+ enabled: true
Review Comment:
```suggestion
workers:
celery:
kerberosSidecar:
enabled: true
```
##########
chart/kustomize-overlays/kerberos/README.rst:
##########
@@ -0,0 +1,196 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+Kerberos Test KDC Overlay
+=========================
+
+This overlay stands up a throwaway in-cluster MIT Kerberos KDC, creates
+the ``airflow/airflow.<namespace>.svc.cluster.local`` service principal,
+and stores its keytab in a Secret named ``<release>-kerberos-keytab``.
+It is a standalone addition; no resource produced by the Helm chart is
+modified.
+
+It is intended as a proof-of-concept of how a non-Airflow component
+(in this case Kerberos infrastructure) can be expressed as a Kustomize
+overlay alongside the chart, rather than baked into the chart itself.
+The keytab Secret it produces is consumable as-is by the chart's
+existing kerberos sidecar (``kerberos.enabled=true``,
+``kerberos.keytab=/etc/airflow.keytab``,
+``extraSecrets.<release>-kerberos-keytab: {}``).
+
+.. warning::
+
+ The KDC pod uses a fixed admin password and stores its database in
+ an ``emptyDir``. Do not connect production workloads to it. Treat it
+ as a test fixture only.
+
+Prerequisites
+-------------
+
+* The Airflow chart installed in the same namespace (any executor).
+* ``kubectl`` access sufficient to apply Deployments, Services,
+ ConfigMaps, Secrets, ServiceAccounts/Roles/RoleBindings, and Jobs in
+ that namespace.
+
+Resources produced
+------------------
+
+* ``ConfigMap/<release>-krb5-conf`` - ``krb5.conf`` with the test realm
+ ``EXAMPLE.COM`` and the in-cluster KDC service as ``kdc``/``admin_server``.
+* ``Deployment/<release>-kerberos-kdc`` - single-replica MIT Kerberos
+ KDC + kadmind, image ``gcavalcante8808/krb5-server:latest``.
+* ``Service/<release>-kerberos-kdc`` - exposes 88 TCP+UDP and 749 TCP.
+* ``ServiceAccount`` + ``Role`` + ``RoleBinding`` named
+ ``<release>-kerberos-bootstrap`` - minimum permissions for the
+ bootstrap Job (pod exec + secret create/update in the same namespace).
+* ``Job/<release>-keytab-bootstrap`` - waits for the KDC to be Ready,
+ runs ``kadmin.local`` against it to create the principal and write
+ the keytab, then stores the keytab in:
+* ``Secret/<release>-kerberos-keytab`` (created by the Job) - holds
+ ``airflow.keytab`` under that key.
+
Review Comment:
```suggestion
```
IMHO it is too details to be in the documentation itself and it can be hard
to keep up-to-date with changes within the overlay.
##########
chart/kustomize-overlays/kerberos/README.rst:
##########
@@ -0,0 +1,196 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+Kerberos Test KDC Overlay
+=========================
+
+This overlay stands up a throwaway in-cluster MIT Kerberos KDC, creates
+the ``airflow/airflow.<namespace>.svc.cluster.local`` service principal,
+and stores its keytab in a Secret named ``<release>-kerberos-keytab``.
+It is a standalone addition; no resource produced by the Helm chart is
+modified.
+
+It is intended as a proof-of-concept of how a non-Airflow component
+(in this case Kerberos infrastructure) can be expressed as a Kustomize
+overlay alongside the chart, rather than baked into the chart itself.
+The keytab Secret it produces is consumable as-is by the chart's
+existing kerberos sidecar (``kerberos.enabled=true``,
+``kerberos.keytab=/etc/airflow.keytab``,
+``extraSecrets.<release>-kerberos-keytab: {}``).
+
+.. warning::
+
+ The KDC pod uses a fixed admin password and stores its database in
+ an ``emptyDir``. Do not connect production workloads to it. Treat it
+ as a test fixture only.
+
+Prerequisites
+-------------
+
+* The Airflow chart installed in the same namespace (any executor).
+* ``kubectl`` access sufficient to apply Deployments, Services,
+ ConfigMaps, Secrets, ServiceAccounts/Roles/RoleBindings, and Jobs in
+ that namespace.
+
+Resources produced
+------------------
+
+* ``ConfigMap/<release>-krb5-conf`` - ``krb5.conf`` with the test realm
+ ``EXAMPLE.COM`` and the in-cluster KDC service as ``kdc``/``admin_server``.
+* ``Deployment/<release>-kerberos-kdc`` - single-replica MIT Kerberos
+ KDC + kadmind, image ``gcavalcante8808/krb5-server:latest``.
+* ``Service/<release>-kerberos-kdc`` - exposes 88 TCP+UDP and 749 TCP.
+* ``ServiceAccount`` + ``Role`` + ``RoleBinding`` named
+ ``<release>-kerberos-bootstrap`` - minimum permissions for the
+ bootstrap Job (pod exec + secret create/update in the same namespace).
+* ``Job/<release>-keytab-bootstrap`` - waits for the KDC to be Ready,
+ runs ``kadmin.local`` against it to create the principal and write
+ the keytab, then stores the keytab in:
+* ``Secret/<release>-kerberos-keytab`` (created by the Job) - holds
+ ``airflow.keytab`` under that key.
+
+Usage
+-----
+
+Reference this overlay from your own kustomization and substitute the
+release name and namespace. A minimal example:
+
+.. code-block:: yaml
+
+ # my-overlay/kustomization.yaml
+ apiVersion: kustomize.config.k8s.io/v1beta1
+ kind: Kustomization
+ namespace: airflow
+
+ resources:
+ -
github.com/apache/airflow/chart/kustomize-overlays/kerberos?ref=helm-chart/1.22.0
+
+Apply with:
+
+.. code-block:: bash
+
+ kubectl apply -k my-overlay/
+
+For a quick test, you can also just substitute the placeholders inline:
+
+.. code-block:: bash
+
+ kubectl kustomize chart/kustomize-overlays/kerberos | \
+ sed -e 's/RELEASE-NAME/airflow/g' -e 's/NAMESPACE/airflow/g' | \
+ kubectl apply -n airflow -f -
+
+This is exactly what ``breeze k8s smoke-test-overlay kerberos`` does
+during the local and CI smoke test.
+
+Wiring the keytab into the chart's sidecar
+------------------------------------------
+
+The chart's kerberos sidecar (``workers.kerberosSidecar``,
+``workers.celery.kerberosSidecar``) mounts a Secret named in
+``kerberos.keytab``. Point that at the Secret produced by this overlay:
+
+.. code-block:: yaml
+
+ # values.yaml fragment
+ kerberos:
+ enabled: true
+ ccacheMountPath: /var/kerberos-ccache
+ keytabPath: /etc/airflow.keytab
+ principal: airflow/[email protected]
+
+ workers:
+ kerberosSidecar:
+ enabled: true
+
+ extraSecrets:
+ airflow-kerberos-keytab: {} # exists from this overlay
+
+Migration guide from the chart
+------------------------------
+
+What the chart currently does
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When ``kerberos.enabled=true`` and ``kerberos.keytabBase64Content`` is
+provided, the chart renders a ``Secret`` carrying the user-supplied
+keytab and a ``ConfigMap`` with the user-supplied ``krb5.conf``. The
+user is expected to bring their own KDC.
+
+What this overlay provides
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+* A working test KDC in the same namespace, so a developer can exercise
+ the chart's kerberos sidecar end-to-end without standing up an
+ external Kerberos service.
+* A bootstrap Job that materialises the keytab Secret automatically,
+ so no base64-encoded blob ends up in ``values.yaml`` or a developer's
+ shell history.
+
+How to switch
+^^^^^^^^^^^^^
+
+1. Install or upgrade the chart with ``kerberos.enabled`` set as you
+ want, but **without** ``kerberos.keytabBase64Content``.
+2. Apply this overlay against the same namespace.
+3. Wait for ``Job/<release>-keytab-bootstrap`` to complete.
+4. Confirm the Secret exists and reference it from the chart's sidecar
+ config as shown above.
+
+Status
+------
+
+This overlay is ``tested``: the ``verify:`` block in ``STATUS.yaml``
+is the smoke-test contract (KDC Deployment Ready, Service exists,
+bootstrap Job Complete, keytab Secret exists), and the
+``test_kerberos.py`` module under
+``chart/tests/overlay_tests/`` adds the
+behavioural assertion: a throwaway client pod ``kinit``\ ing against
+the in-cluster KDC and confirming the principal in ``klist`` output.
+``last-verified`` in ``STATUS.yaml`` records the most recent green
+local run; re-run the smoke test with ``--promote-status`` to refresh
+it whenever you re-verify against your cluster.
+
+To run the smoke test locally:
+
+.. code-block:: bash
+
+ breeze k8s deploy-cluster --rebuild-base-image
+ breeze k8s deploy-airflow
+ breeze k8s smoke-test-overlay kerberos --promote-status
+
Review Comment:
```suggestion
.. note::
``--rebuild-base-image`` flag is only needed during the first run
```
I've forgotten about that note in the previous suggestion 😕.
##########
chart/kustomize-overlays/kerberos/README.rst:
##########
@@ -0,0 +1,196 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+Kerberos Test KDC Overlay
+=========================
+
+This overlay stands up a throwaway in-cluster MIT Kerberos KDC, creates
+the ``airflow/airflow.<namespace>.svc.cluster.local`` service principal,
+and stores its keytab in a Secret named ``<release>-kerberos-keytab``.
+It is a standalone addition; no resource produced by the Helm chart is
+modified.
+
+It is intended as a proof-of-concept of how a non-Airflow component
+(in this case Kerberos infrastructure) can be expressed as a Kustomize
+overlay alongside the chart, rather than baked into the chart itself.
+The keytab Secret it produces is consumable as-is by the chart's
+existing kerberos sidecar (``kerberos.enabled=true``,
+``kerberos.keytab=/etc/airflow.keytab``,
+``extraSecrets.<release>-kerberos-keytab: {}``).
+
+.. warning::
+
+ The KDC pod uses a fixed admin password and stores its database in
+ an ``emptyDir``. Do not connect production workloads to it. Treat it
+ as a test fixture only.
+
+Prerequisites
+-------------
+
+* The Airflow chart installed in the same namespace (any executor).
+* ``kubectl`` access sufficient to apply Deployments, Services,
+ ConfigMaps, Secrets, ServiceAccounts/Roles/RoleBindings, and Jobs in
+ that namespace.
+
+Resources produced
+------------------
+
+* ``ConfigMap/<release>-krb5-conf`` - ``krb5.conf`` with the test realm
+ ``EXAMPLE.COM`` and the in-cluster KDC service as ``kdc``/``admin_server``.
+* ``Deployment/<release>-kerberos-kdc`` - single-replica MIT Kerberos
+ KDC + kadmind, image ``gcavalcante8808/krb5-server:latest``.
+* ``Service/<release>-kerberos-kdc`` - exposes 88 TCP+UDP and 749 TCP.
+* ``ServiceAccount`` + ``Role`` + ``RoleBinding`` named
+ ``<release>-kerberos-bootstrap`` - minimum permissions for the
+ bootstrap Job (pod exec + secret create/update in the same namespace).
+* ``Job/<release>-keytab-bootstrap`` - waits for the KDC to be Ready,
+ runs ``kadmin.local`` against it to create the principal and write
+ the keytab, then stores the keytab in:
+* ``Secret/<release>-kerberos-keytab`` (created by the Job) - holds
+ ``airflow.keytab`` under that key.
+
+Usage
+-----
+
+Reference this overlay from your own kustomization and substitute the
+release name and namespace. A minimal example:
+
+.. code-block:: yaml
+
+ # my-overlay/kustomization.yaml
+ apiVersion: kustomize.config.k8s.io/v1beta1
+ kind: Kustomization
+ namespace: airflow
+
+ resources:
+ -
github.com/apache/airflow/chart/kustomize-overlays/kerberos?ref=helm-chart/1.22.0
+
+Apply with:
+
+.. code-block:: bash
+
+ kubectl apply -k my-overlay/
+
+For a quick test, you can also just substitute the placeholders inline:
+
+.. code-block:: bash
+
+ kubectl kustomize chart/kustomize-overlays/kerberos | \
+ sed -e 's/RELEASE-NAME/airflow/g' -e 's/NAMESPACE/airflow/g' | \
+ kubectl apply -n airflow -f -
+
+This is exactly what ``breeze k8s smoke-test-overlay kerberos`` does
+during the local and CI smoke test.
+
+Wiring the keytab into the chart's sidecar
+------------------------------------------
+
+The chart's kerberos sidecar (``workers.kerberosSidecar``,
+``workers.celery.kerberosSidecar``) mounts a Secret named in
+``kerberos.keytab``. Point that at the Secret produced by this overlay:
+
+.. code-block:: yaml
+
+ # values.yaml fragment
+ kerberos:
+ enabled: true
+ ccacheMountPath: /var/kerberos-ccache
+ keytabPath: /etc/airflow.keytab
+ principal: airflow/[email protected]
+
+ workers:
+ kerberosSidecar:
+ enabled: true
+
+ extraSecrets:
+ airflow-kerberos-keytab: {} # exists from this overlay
+
+Migration guide from the chart
+------------------------------
+
+What the chart currently does
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When ``kerberos.enabled=true`` and ``kerberos.keytabBase64Content`` is
+provided, the chart renders a ``Secret`` carrying the user-supplied
+keytab and a ``ConfigMap`` with the user-supplied ``krb5.conf``. The
+user is expected to bring their own KDC.
+
+What this overlay provides
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+* A working test KDC in the same namespace, so a developer can exercise
+ the chart's kerberos sidecar end-to-end without standing up an
+ external Kerberos service.
+* A bootstrap Job that materialises the keytab Secret automatically,
+ so no base64-encoded blob ends up in ``values.yaml`` or a developer's
+ shell history.
+
+How to switch
+^^^^^^^^^^^^^
+
+1. Install or upgrade the chart with ``kerberos.enabled`` set as you
+ want, but **without** ``kerberos.keytabBase64Content``.
+2. Apply this overlay against the same namespace.
+3. Wait for ``Job/<release>-keytab-bootstrap`` to complete.
+4. Confirm the Secret exists and reference it from the chart's sidecar
+ config as shown above.
+
+Status
+------
+
+This overlay is ``tested``: the ``verify:`` block in ``STATUS.yaml``
+is the smoke-test contract (KDC Deployment Ready, Service exists,
+bootstrap Job Complete, keytab Secret exists), and the
+``test_kerberos.py`` module under
+``chart/tests/overlay_tests/`` adds the
+behavioural assertion: a throwaway client pod ``kinit``\ ing against
+the in-cluster KDC and confirming the principal in ``klist`` output.
+``last-verified`` in ``STATUS.yaml`` records the most recent green
+local run; re-run the smoke test with ``--promote-status`` to refresh
+it whenever you re-verify against your cluster.
+
+To run the smoke test locally:
+
+.. code-block:: bash
+
+ breeze k8s deploy-cluster --rebuild-base-image
+ breeze k8s deploy-airflow
+ breeze k8s smoke-test-overlay kerberos --promote-status
+
+The ``--promote-status`` flag rewrites this overlay's ``STATUS.yaml``
+in place on a green run (chart-version from ``chart/Chart.yaml`` plus
+today's date). Without it the smoke test still runs and verifies, it
+just leaves ``STATUS.yaml`` untouched.
+
+The ``build-k8s-image`` + ``upload-k8s-image`` pair is required locally
+because the chart's default image lives on ghcr.io behind CI auth;
+without those steps ``deploy-airflow`` will fail with ImagePullBackOff
+(HTTP 403). CI itself runs ``breeze k8s run-complete-tests`` which
+chains all of the above.
+
+The smoke test runner takes care of this overlay's own images itself:
+``gcavalcante8808/krb5-server`` (KDC pod) and ``alpine/k8s`` (bootstrap
+Job) are auto-discovered from the rendered manifest and pre-loaded into
+every kind node before apply, so the test does not depend on a live
+Docker Hub pull at run time. See ``chart/kustomize-overlays/CONTRIBUTING.rst``
+"What ``smoke-test-overlay`` does for every overlay" for the generic
+machinery.
+
+See ``CONTRIBUTING.rst`` in this directory's parent for the lifecycle
+and the difference between the structural ``build_kustomize_overlays``
+prek hook (which runs on every commit) and this functional smoke test.
Review Comment:
```suggestion
```
It is a long read, but I think that it is repetitive information about how
image discovery works. I don't think that we need a copy of that information
for every overlay, but I would leave a link to the CONTRIBUTING doc, as there
is information how it works.
##########
chart/kustomize-overlays/CONTRIBUTING.rst:
##########
@@ -95,6 +95,92 @@ For an overlay scheduled for removal:
status: deprecated
message: "Replaced by <overlay-name>. Will be removed in chart 3.0.0."
+The optional ``verify:`` block is the smoke-test contract and is also
+**the discovery key for CI**:
+
+.. code-block:: yaml
+
+ verify:
+ timeout_seconds: 300 # optional; default 300, max 3600
+ # `name` is the SUFFIX only - the runner auto-prepends
+ # `<release-name>-` so the same overlay works under any release.
+ # Write `foo`, not `RELEASE-NAME-foo`. The legacy `RELEASE-NAME-foo`
+ # form is still tolerated for older overlays but the short form
+ # is the new convention.
Review Comment:
```suggestion
# `name` is the SUFFIX only - the runner auto-prepends
# `<release-name>-` so the same overlay works under any release.
# Write `foo`, not `RELEASE-NAME-foo`.
```
I believe that we didn't introduce the mentioned convention.
##########
dev/breeze/src/airflow_breeze/commands/kubernetes_commands.py:
##########
@@ -2450,3 +2463,642 @@ def deploy_cluster(
)
if return_code != 0:
sys.exit(return_code)
+
+
+# ---------------------------------------------------------------------------
+# `breeze k8s smoke-test-overlay` — functional smoke test for a single
+# kustomize overlay under chart/kustomize-overlays/.
+#
+# Counterpart to the structural `build_kustomize_overlays` prek hook:
+# the prek hook validates that an overlay builds and that its STATUS.yaml
+# parses, while this command applies the overlay to a running kind
+# cluster, waits for every resource declared in the STATUS.yaml `verify:`
+# block, and optionally runs a per-overlay pytest module for behavioural
+# checks. An overlay's STATUS may only advance to `tested` once this
+# command exits 0.
+# ---------------------------------------------------------------------------
+
+KUSTOMIZE_OVERLAYS_PATH = CHART_PATH / "kustomize-overlays"
+# Behavioural overlay tests live under chart/tests/overlay_tests/ (NOT
+# kubernetes-tests/) so they sit next to the overlay manifests and the
+# rest of chart-adjacent pytest content. They are NOT discovered by
+# `breeze testing helm-tests --test-type all` because `overlay_tests` is
+# in chart/pyproject.toml's norecursedirs — only this command (which
+# invokes pytest by explicit path) sees them.
+OVERLAY_TESTS_DIR = CHART_PATH / "tests" / "overlay_tests"
+
+
+def _substitute_overlay_placeholders(rendered: str, release_name: str,
namespace: str) -> str:
+ return rendered.replace("RELEASE-NAME", release_name).replace("NAMESPACE",
namespace)
+
+
+def _resolve_verify_resource_name(name: str, release_name: str) -> str:
+ """Auto-prepend ``<release_name>-`` to a verify-block resource name.
+
+ STATUS.yaml's ``verify:`` entries name resources by their **suffix
+ only** (e.g. ``kerberos-kdc``) - the runner prepends the release name
+ so the same overlay can be installed under any release. A legacy
+ ``RELEASE-NAME-foo`` form is still tolerated so older overlays keep
+ working after the schema change: the literal ``RELEASE-NAME-`` prefix
+ is stripped before the auto-prepend, leaving the suffix to be
+ re-prefixed with the actual release name.
+ """
+ if name.startswith("RELEASE-NAME-"):
+ name = name[len("RELEASE-NAME-") :]
+ return f"{release_name}-{name}"
+
+
+def _load_overlay_verify_block(overlay_dir: Path) -> dict[str, Any]:
+ status_path = overlay_dir / "STATUS.yaml"
+ if not status_path.exists():
+ console_print(f"[error]Overlay {overlay_dir.name} is missing
STATUS.yaml")
+ sys.exit(1)
+ status_doc = yaml.safe_load(status_path.read_text()) or {}
+ verify = status_doc.get("verify")
+ if not verify:
+ console_print(
+ f"[error]Overlay {overlay_dir.name} has no `verify:` block in
STATUS.yaml; "
+ "add one before running the smoke test (see
chart/kustomize-overlays/CONTRIBUTING.rst)."
+ )
+ sys.exit(1)
+ return verify
+
+
+# Container `state.waiting.reason` values that indicate the pod is not going
+# to recover without operator intervention (image is missing, image name is
+# malformed, container is in a tight crash loop). Treated as immediate
+# verify-block failure rather than waited out to the configured timeout.
Review Comment:
```suggestion
# Container `state.waiting.reason` values that indicate the pod is not going
# to recover without human intervention. Treated as an immediate
# verify-block failure rather than waiting for the configured timeout.
```
##########
dev/breeze/doc/05_test_commands.rst:
##########
@@ -650,6 +650,61 @@ output during test execution.
breeze k8s tests -- test_kubernetes_executor.py -s
+Smoke-testing a kustomize overlay
+.................................
+
+You can run ``breeze k8s smoke-test-overlay <name>`` to apply one of the
+overlays in ``chart/kustomize-overlays/`` to the current KinD cluster,
+wait for every resource declared in that overlay's ``STATUS.yaml``
+``verify:`` block, and run the optional per-overlay pytest module under
+``chart/tests/overlay_tests/``. It is the
+functional counterpart of the structural ``build_kustomize_overlays``
+prek hook; an overlay's ``STATUS`` may only advance to ``tested`` once
+this command exits 0.
Review Comment:
```suggestion
``chart/tests/overlay_tests/``. An overlay's ``STATUS`` may only advance to
``tested`` once this command exits 0.
```
##########
dev/breeze/src/airflow_breeze/commands/kubernetes_commands.py:
##########
@@ -2450,3 +2463,642 @@ def deploy_cluster(
)
if return_code != 0:
sys.exit(return_code)
+
+
+# ---------------------------------------------------------------------------
+# `breeze k8s smoke-test-overlay` — functional smoke test for a single
+# kustomize overlay under chart/kustomize-overlays/.
+#
+# Counterpart to the structural `build_kustomize_overlays` prek hook:
+# the prek hook validates that an overlay builds and that its STATUS.yaml
+# parses, while this command applies the overlay to a running kind
+# cluster, waits for every resource declared in the STATUS.yaml `verify:`
+# block, and optionally runs a per-overlay pytest module for behavioural
+# checks. An overlay's STATUS may only advance to `tested` once this
+# command exits 0.
+# ---------------------------------------------------------------------------
+
+KUSTOMIZE_OVERLAYS_PATH = CHART_PATH / "kustomize-overlays"
+# Behavioural overlay tests live under chart/tests/overlay_tests/ (NOT
+# kubernetes-tests/) so they sit next to the overlay manifests and the
+# rest of chart-adjacent pytest content. They are NOT discovered by
+# `breeze testing helm-tests --test-type all` because `overlay_tests` is
+# in chart/pyproject.toml's norecursedirs — only this command (which
+# invokes pytest by explicit path) sees them.
+OVERLAY_TESTS_DIR = CHART_PATH / "tests" / "overlay_tests"
+
+
+def _substitute_overlay_placeholders(rendered: str, release_name: str,
namespace: str) -> str:
+ return rendered.replace("RELEASE-NAME", release_name).replace("NAMESPACE",
namespace)
+
+
+def _resolve_verify_resource_name(name: str, release_name: str) -> str:
+ """Auto-prepend ``<release_name>-`` to a verify-block resource name.
+
+ STATUS.yaml's ``verify:`` entries name resources by their **suffix
+ only** (e.g. ``kerberos-kdc``) - the runner prepends the release name
+ so the same overlay can be installed under any release. A legacy
+ ``RELEASE-NAME-foo`` form is still tolerated so older overlays keep
+ working after the schema change: the literal ``RELEASE-NAME-`` prefix
+ is stripped before the auto-prepend, leaving the suffix to be
+ re-prefixed with the actual release name.
+ """
+ if name.startswith("RELEASE-NAME-"):
+ name = name[len("RELEASE-NAME-") :]
+ return f"{release_name}-{name}"
+
+
+def _load_overlay_verify_block(overlay_dir: Path) -> dict[str, Any]:
+ status_path = overlay_dir / "STATUS.yaml"
+ if not status_path.exists():
+ console_print(f"[error]Overlay {overlay_dir.name} is missing
STATUS.yaml")
+ sys.exit(1)
+ status_doc = yaml.safe_load(status_path.read_text()) or {}
+ verify = status_doc.get("verify")
+ if not verify:
+ console_print(
+ f"[error]Overlay {overlay_dir.name} has no `verify:` block in
STATUS.yaml; "
+ "add one before running the smoke test (see
chart/kustomize-overlays/CONTRIBUTING.rst)."
+ )
+ sys.exit(1)
+ return verify
+
+
+# Container `state.waiting.reason` values that indicate the pod is not going
+# to recover without operator intervention (image is missing, image name is
+# malformed, container is in a tight crash loop). Treated as immediate
+# verify-block failure rather than waited out to the configured timeout.
+_TERMINAL_POD_WAITING_REASONS: frozenset[str] = frozenset(
+ {
+ "ImagePullBackOff",
+ "ErrImagePull",
+ "InvalidImageName",
+ "ImageInspectError",
+ "CrashLoopBackOff",
+ "CreateContainerConfigError",
+ "CreateContainerError",
+ }
+)
+
+
+def _has_terminal_pod_failure(
+ kind: str,
+ name: str,
+ namespace: str,
+ env: dict[str, str],
+ kubectl: str,
+) -> tuple[bool, str]:
+ """Return (failed, reason) for pods backing this verify resource.
+
+ Inspects waiting reasons on every regular and init container of every
+ pod selected by the resource's controller. The selector lookup mirrors
+ what the controller itself uses: ``spec.selector.matchLabels`` for
+ Deployment / StatefulSet / DaemonSet, the auto-applied
+ ``job-name=<name>`` label for Job. Resources without backing pods
+ (Service, Secret, ConfigMap, CRDs, …) are always treated as not-failed
+ here; their progress is observed via the success condition alone.
+ """
+ if kind not in ("Deployment", "StatefulSet", "DaemonSet", "Job"):
+ return False, ""
+ if kind == "Job":
+ selector = f"job-name={name}"
+ else:
+ sel = run_command(
+ [
+ kubectl,
+ "get",
+ kind.lower(),
+ name,
+ "-n",
+ namespace,
+ "-o",
+ "go-template={{range $k,$v :=
.spec.selector.matchLabels}}{{$k}}={{$v}},{{end}}",
+ ],
+ env=env,
+ check=False,
+ capture_output=True,
+ text=True,
+ )
+ if sel.returncode != 0 or not sel.stdout.strip():
+ return False, ""
+ selector = sel.stdout.strip().rstrip(",")
+ pods = run_command(
+ [
+ kubectl,
+ "get",
+ "pods",
+ "-n",
+ namespace,
+ "-l",
+ selector,
+ "-o",
+ (
+ "jsonpath={range .items[*]}"
+ "{range
.status.containerStatuses[*]}{.state.waiting.reason}|{end}"
+ "{range
.status.initContainerStatuses[*]}{.state.waiting.reason}|{end}"
+ "{end}"
+ ),
+ ],
+ env=env,
+ check=False,
+ capture_output=True,
+ text=True,
+ )
+ if pods.returncode != 0:
+ return False, ""
+ for reason in (r for r in pods.stdout.split("|") if r):
+ if reason in _TERMINAL_POD_WAITING_REASONS:
+ return True, reason
+ return False, ""
+
+
+def _wait_for_verify_resource(
+ resource: dict[str, Any],
+ namespace: str,
+ timeout_seconds: int,
+ env: dict[str, str],
+) -> int:
+ """Poll a verify-block resource to success or terminal failure.
+
+ Each iteration runs a one-second kubectl check for the success
+ condition and, in the same cycle, inspects backing pods (if any) for
+ terminal waiting reasons (ImagePullBackOff/CrashLoopBackOff/…). The
+ moment a terminal reason appears the loop aborts with a non-zero
+ return; otherwise it sleeps and retries up to the configured deadline.
+
+ Branch matrix:
+ * ``ready: true`` -> rollout status (Deployment/StatefulSet/DaemonSet)
+ * ``complete: true`` -> wait for condition=complete (Job)
+ * neither -> wait for the resource to be created (handles Secrets /
+ ConfigMaps / CRD children that an overlay's own Job or controller
+ materialises asynchronously; returns immediately for synchronously
+ applied resources)
+ """
+ kind = resource["kind"]
+ name = resource["name"]
+ kubectl = str(KUBECTL_BIN_PATH)
+ ns_args = ["-n", namespace]
+ target = f"{kind.lower()}/{name}"
+ if resource.get("ready") and kind in ("Deployment", "StatefulSet",
"DaemonSet"):
+ check_cmd = [kubectl, "rollout", "status", target, *ns_args,
"--timeout=1s"]
+ readiness = "ready"
+ elif resource.get("complete") and kind == "Job":
+ check_cmd = [kubectl, "wait", "--for=condition=complete", target,
*ns_args, "--timeout=1s"]
+ readiness = "complete"
+ else:
+ check_cmd = [kubectl, "wait", "--for=create", target, *ns_args,
"--timeout=1s"]
+ readiness = "created"
+ console_print(f"[info]verify: waiting for {kind}/{name} to be {readiness}
(timeout={timeout_seconds}s)")
+ deadline = time.monotonic() + timeout_seconds
+ poll_interval = 5
+ last_err = ""
+ while time.monotonic() < deadline:
+ success = run_command(check_cmd, env=env, check=False,
capture_output=True, text=True)
+ if success.returncode == 0:
+ console_print(f"[success]verify: {kind}/{name} is {readiness}")
+ return 0
+ last_err = (success.stderr or "").strip()
+ failed, reason = _has_terminal_pod_failure(kind, name, namespace, env,
kubectl)
+ if failed:
+ console_print(
+ f"[error]verify failed early for {kind}/{name}: backing pod in
{reason}; "
+ "not waiting out the full timeout."
+ )
+ run_command([kubectl, "describe", kind.lower(), name, *ns_args],
env=env, check=False)
+ run_command([kubectl, "get", "pods", *ns_args, "-o", "wide"],
env=env, check=False)
+ return 1
+ time.sleep(poll_interval)
+ console_print(
+ f"[error]verify timed out for {kind}/{name} after {timeout_seconds}s.
Last error: {last_err}"
+ )
+ run_command([kubectl, "describe", kind.lower(), name, *ns_args], env=env,
check=False)
+ return 1
+
+
+def _render_overlay(
+ overlay_dir: Path,
+ release_name: str,
+ namespace: str,
+ env: dict[str, str],
+) -> str | None:
+ kubectl = str(KUBECTL_BIN_PATH)
+ render = run_command(
+ [kubectl, "kustomize", str(overlay_dir)],
+ env=env,
+ check=False,
+ capture_output=True,
+ text=True,
+ )
+ if render.returncode != 0:
+ console_print(f"[error]kubectl kustomize failed:\n{render.stderr}")
+ return None
+ return _substitute_overlay_placeholders(render.stdout, release_name,
namespace)
+
+
+def _discover_overlay_images(manifest: str) -> list[str]:
+ """Extract every container image referenced by the rendered manifest.
+
+ Walks every loaded YAML doc and collects every ``image:`` string value,
+ regardless of nesting depth, so it picks up containers, initContainers,
+ sidecars under any pod-spec-bearing kind (Deployment, StatefulSet,
+ DaemonSet, Job, CronJob, Pod) without needing per-kind logic.
+ """
+ images: set[str] = set()
+
+ def _walk(node: Any) -> None:
+ if isinstance(node, dict):
+ for k, v in node.items():
+ if k == "image" and isinstance(v, str):
+ images.add(v)
+ else:
+ _walk(v)
+ elif isinstance(node, list):
+ for item in node:
+ _walk(item)
+
+ for doc in yaml.safe_load_all(manifest):
+ if doc:
+ _walk(doc)
+ return sorted(images)
+
+
+def _preload_overlay_images(
+ manifest: str,
+ python: str,
+ kubernetes_version: str,
+) -> int:
+ """Pre-pull every image the overlay references and ``kind load`` it.
+
+ Same pattern as ``_preload_test_images_to_kind`` but driven by what the
+ overlay actually declares, so it stays in sync as overlays evolve and
+ works for any overlay without a per-overlay images list. With
+ imagePullPolicy=IfNotPresent set on the overlay's pods (the convention),
+ kubelet never reaches out to a registry once these are loaded — so the
+ smoke test does not flake on Docker Hub rate limits or registry outages.
+ """
+ images = _discover_overlay_images(manifest)
+ if not images:
+ return 0
+ cluster_name = get_kind_cluster_name(python=python,
kubernetes_version=kubernetes_version)
+ console_print(
+ f"[info]Preloading {len(images)} overlay image(s) into kind cluster
{cluster_name}: {images}"
+ )
+ for image in images:
+ pull_rc = _docker_pull_with_429_retry(image, output=None)
+ if pull_rc != 0:
+ console_print(f"[error]docker pull {image} failed")
+ return pull_rc
+ kind_load = run_command_with_k8s_env(
+ ["kind", "load", "docker-image", "--name", cluster_name, image],
+ python=python,
+ kubernetes_version=kubernetes_version,
+ check=False,
+ )
+ if kind_load.returncode != 0:
+ console_print(f"[error]kind load docker-image {image} into
{cluster_name} failed")
+ return kind_load.returncode
+ return 0
+
+
+def _apply_or_delete_overlay(
+ action: Literal["apply", "delete"],
+ manifest: str,
+ namespace: str,
+ env: dict[str, str],
+) -> int:
+ kubectl = str(KUBECTL_BIN_PATH)
+ extra: list[str] = ["--ignore-not-found=true"] if action == "delete" else
[]
+ result = run_command(
+ [kubectl, action, "-n", namespace, *extra, "-f", "-"],
+ env=env,
+ check=False,
+ input=manifest,
+ text=True,
+ )
+ return result.returncode
+
+
+class _SequenceIndentingDumper(yaml.SafeDumper):
+ """yaml.SafeDumper variant that indents sequence items under their key.
+
+ PyYAML's default safe_dump output emits ``resources:\\n- kind: …`` which
+ yamllint (with ``indent-sequences: true``, the repo's default) rejects
+ with "expected 4 but found 2". Overriding ``increase_indent`` to pass
+ ``indentless=False`` produces ``resources:\\n - kind: …`` instead, which
+ matches the rest of the YAML in the repo and keeps the yamllint hook
+ green on the auto-promoted STATUS.yaml.
+ """
+
+ def increase_indent(self, flow: bool = False, indentless: bool = False) ->
None:
+ return super().increase_indent(flow, False)
+
+
+def _promote_overlay_status(overlay_dir: Path) -> int:
+ """Rewrite STATUS.yaml in-place to ``status: tested``.
+
+ Preserves everything above the YAML document separator ``---``
+ verbatim (license header + any explanatory comments). Re-emits the
+ document body with status fields refreshed and the existing
+ ``verify:`` block carried over.
+
+ Idempotent: if the overlay is already ``tested``, ``chart-version``
+ and ``last-verified`` are refreshed to current values. ``deprecated``
+ overlays are refused.
+ """
+ import datetime
+
+ status_path = overlay_dir / "STATUS.yaml"
+ original = status_path.read_text()
+ sep_idx = original.find("\n---")
+ if sep_idx >= 0:
+ header = original[:sep_idx] + "\n---\n"
+ body = original[sep_idx + len("\n---") :]
+ else:
+ header = ""
+ body = original
+ doc = yaml.safe_load(body) or {}
+ if doc.get("status") == "deprecated":
+ console_print(
+ f"[error]Refusing to promote
{status_path.relative_to(AIRFLOW_ROOT_PATH)}: "
+ "status is `deprecated`. Remove the deprecation first."
+ )
+ return 1
+ chart_meta = yaml.safe_load((CHART_PATH / "Chart.yaml").read_text())
+ promoted: dict[str, Any] = {
+ "status": "tested",
+ "chart-version": str(chart_meta["version"]),
+ "last-verified": datetime.date.today().isoformat(),
+ }
+ verify = doc.get("verify")
+ if verify:
+ promoted["verify"] = verify
+ rendered = yaml.dump(
+ promoted,
+ Dumper=_SequenceIndentingDumper,
+ sort_keys=False,
+ default_flow_style=False,
+ )
+ status_path.write_text(header + rendered)
+ console_print(
+ f"[success]Promoted {status_path.relative_to(AIRFLOW_ROOT_PATH)}: "
+ f"status=tested chart-version={promoted['chart-version']} "
+ f"last-verified={promoted['last-verified']}"
+ )
+ console_print(f"[info]Review with `git diff
{status_path.relative_to(AIRFLOW_ROOT_PATH)}` and commit.")
+ return 0
+
+
+def _run_overlay_pytest(
+ overlay_name: str,
+ namespace: str,
+ release_name: str,
+ python: str,
+ kubernetes_version: str,
+ executor: str,
+) -> int:
+ test_file = OVERLAY_TESTS_DIR / f"test_{overlay_name}.py"
+ if not test_file.exists():
+ console_print(
+ f"[info]No behavioural test module at
{test_file.relative_to(AIRFLOW_ROOT_PATH)} — "
+ "verify-block checks are the only assertions for this overlay."
+ )
+ return 0
+ env = get_k8s_env(python=python, kubernetes_version=kubernetes_version,
executor=executor)
+ env["OVERLAY_UNDER_TEST"] = overlay_name
+ env["OVERLAY_NAMESPACE"] = namespace
+ env["OVERLAY_RELEASE_NAME"] = release_name
+ pytest_cmd = ["uv", "run", "pytest",
str(test_file.relative_to(CHART_PATH)), "-xvs"]
+ console_print(f"[info]Running behavioural tests: {' '.join(pytest_cmd)}")
+ result = run_command(pytest_cmd, env=env, check=False,
cwd=CHART_PATH.as_posix())
+ return result.returncode
+
+
+def _is_ci() -> bool:
+ """Detect GitHub Actions / generic CI via the conventional CI=true env var.
+
+ Matches the existing pattern used elsewhere in breeze (e.g.
+ ``sync_virtualenv`` in kubernetes_utils.py), so the behaviour is
+ consistent: anything keyed off "are we in CI" reads the same signal.
+ """
+ return os.environ.get("CI", "").lower() == "true"
+
+
+def _smoke_test_overlay_impl(
+ overlay_name: str,
+ overlay_dir: Path,
+ verify: dict[str, Any],
+ python: str,
+ kubernetes_version: str,
+ executor: str,
+ release_name: str,
+ namespace: str,
+ skip_cleanup: bool,
+ no_pytest: bool,
+) -> int:
+ """Run the smoke test and return a single exit code.
+
+ Kept return-based (not ``sys.exit``-based) so the click wrapper can
+ decide whether to swallow the failure (CI + not-tested) or surface
+ it as a non-zero process exit.
+ """
+ env = get_k8s_env(python=python, kubernetes_version=kubernetes_version,
executor=executor)
+ console_print(f"\n[info]Rendering overlay {overlay_name}...")
+ manifest = _render_overlay(overlay_dir, release_name, namespace, env)
+ if manifest is None:
+ return 1
+ console_print("\n[info]Preloading overlay images into kind cluster...")
+ if _preload_overlay_images(manifest, python, kubernetes_version) != 0:
+ console_print("[error]Image preload failed.")
+ return 1
+ console_print(f"\n[info]Applying overlay {overlay_name} to namespace
{namespace}...")
+ if _apply_or_delete_overlay("apply", manifest, namespace, env) != 0:
+ console_print("[error]Overlay apply failed.")
+ return 1
+ try:
+ console_print("\n[info]Walking STATUS.yaml verify block...")
+ timeout = int(verify.get("timeout_seconds", 300))
+ for resource in verify["resources"]:
+ substituted = {**resource, "name":
_resolve_verify_resource_name(resource["name"], release_name)}
+ if _wait_for_verify_resource(substituted, namespace, timeout, env)
!= 0:
+ console_print("[error]verify block failed.")
+ return 1
+ console_print("\n[success]verify block passed.")
+ if not no_pytest:
+ rc = _run_overlay_pytest(
+ overlay_name=overlay_name,
+ namespace=namespace,
+ release_name=release_name,
+ python=python,
+ kubernetes_version=kubernetes_version,
+ executor=executor,
+ )
+ if rc != 0:
+ console_print("[error]Per-overlay pytest module failed.")
+ return rc
+ finally:
+ if skip_cleanup:
+ console_print("[warning]--skip-cleanup set; overlay left in
place.")
+ else:
+ console_print(f"\n[info]Cleaning up overlay {overlay_name}...")
+ _apply_or_delete_overlay("delete", manifest, namespace, env)
+ return 0
+
+
+@kubernetes_group.command(
+ name="smoke-test-overlay",
+ help="Apply a kustomize overlay to the current KinD cluster, wait for its
STATUS.yaml "
+ "`verify:` resources, and run the optional per-overlay pytest module.",
+)
[email protected]("overlay_name", type=str)
+@option_python
+@option_kubernetes_version
+@option_executor
[email protected](
+ "--release-name",
+ default="airflow",
+ show_default=True,
+ help="Substitute for the RELEASE-NAME placeholder in the overlay.",
+)
[email protected](
+ "--namespace",
+ default="airflow",
+ show_default=True,
+ help="Namespace to apply into and substitute for the NAMESPACE
placeholder.",
+)
[email protected](
+ "--skip-cleanup",
+ is_flag=True,
+ help="Leave the overlay applied after the run (useful for debugging).",
+)
[email protected](
+ "--no-pytest",
+ is_flag=True,
+ help="Skip the per-overlay pytest module even if it exists.",
+)
[email protected](
+ "--promote-status",
+ is_flag=True,
+ help=(
+ "If the run is green (verify block + per-overlay pytest both pass),
rewrite the "
+ "overlay's STATUS.yaml in place to `status: tested` with chart-version
from "
+ "chart/Chart.yaml and today's date as last-verified. Opt-in because it
modifies "
+ "a checked-in file; review with `git diff` and commit. Refused when
CI=true."
+ ),
+)
+@option_verbose
+@option_dry_run
+def smoke_test_overlay(
+ overlay_name: str,
+ python: str,
+ kubernetes_version: str,
+ executor: str,
+ release_name: str,
+ namespace: str,
+ skip_cleanup: bool,
+ no_pytest: bool,
+ promote_status: bool,
+):
+ in_ci = _is_ci()
+ if promote_status and in_ci:
+ console_print(
+ "[error]Refusing --promote-status when CI=true is set. "
+ "STATUS.yaml is a checked-in file: it must be flipped locally by a
developer "
+ "(the deliberate human claim 'I verified this against my cluster')
and committed. "
+ "CI re-runs the smoke test to verify the existing STATUS, not to
mutate it."
+ )
+ sys.exit(1)
+ make_sure_kubernetes_tools_are_installed()
+ overlay_dir = KUSTOMIZE_OVERLAYS_PATH / overlay_name
+ if not (overlay_dir / "kustomization.yaml").is_file():
+ console_print(
+ f"[error]No overlay at
{overlay_dir.relative_to(AIRFLOW_ROOT_PATH)} "
+ f"(expected a kustomization.yaml there)."
+ )
+ sys.exit(1)
+ # Two-step pre-flight. The kubeconfig file persists across docker
+ # restarts even though the kind cluster is gone, so checking only
+ # the file leads to the next step (`kind load docker-image`) hanging
+ # silently while it tries to reach a cluster that no longer exists.
+ # `kind get clusters` returns the current list of live clusters and
+ # is the actual source of truth.
+ kubeconfig = get_kubeconfig_file(python=python,
kubernetes_version=kubernetes_version)
+ cluster_name = get_kind_cluster_name(python=python,
kubernetes_version=kubernetes_version)
+ bootstrap_hint = (
+ f"[error]Run first:\n"
+ f" breeze k8s setup-env\n"
+ f" breeze k8s create-cluster --python {python} --kubernetes-version
{kubernetes_version}\n"
+ f" breeze k8s configure-cluster\n"
+ f" breeze k8s build-k8s-image --rebuild-base-image # first time
only\n"
+ f" breeze k8s upload-k8s-image\n"
Review Comment:
```suggestion
f" breeze k8s deploy-cluster --python {python} --kubernetes-version
{kubernetes_version} --rebuild-base-image\n"
f" # --rebuild-base-image flag only needed during first run"
```
##########
dev/breeze/src/airflow_breeze/commands/kubernetes_commands.py:
##########
@@ -2450,3 +2463,642 @@ def deploy_cluster(
)
if return_code != 0:
sys.exit(return_code)
+
+
+# ---------------------------------------------------------------------------
+# `breeze k8s smoke-test-overlay` — functional smoke test for a single
+# kustomize overlay under chart/kustomize-overlays/.
+#
+# Counterpart to the structural `build_kustomize_overlays` prek hook:
+# the prek hook validates that an overlay builds and that its STATUS.yaml
+# parses, while this command applies the overlay to a running kind
+# cluster, waits for every resource declared in the STATUS.yaml `verify:`
+# block, and optionally runs a per-overlay pytest module for behavioural
+# checks. An overlay's STATUS may only advance to `tested` once this
+# command exits 0.
+# ---------------------------------------------------------------------------
+
+KUSTOMIZE_OVERLAYS_PATH = CHART_PATH / "kustomize-overlays"
+# Behavioural overlay tests live under chart/tests/overlay_tests/ (NOT
+# kubernetes-tests/) so they sit next to the overlay manifests and the
+# rest of chart-adjacent pytest content. They are NOT discovered by
+# `breeze testing helm-tests --test-type all` because `overlay_tests` is
+# in chart/pyproject.toml's norecursedirs — only this command (which
+# invokes pytest by explicit path) sees them.
+OVERLAY_TESTS_DIR = CHART_PATH / "tests" / "overlay_tests"
+
+
+def _substitute_overlay_placeholders(rendered: str, release_name: str,
namespace: str) -> str:
+ return rendered.replace("RELEASE-NAME", release_name).replace("NAMESPACE",
namespace)
+
+
+def _resolve_verify_resource_name(name: str, release_name: str) -> str:
+ """Auto-prepend ``<release_name>-`` to a verify-block resource name.
+
+ STATUS.yaml's ``verify:`` entries name resources by their **suffix
+ only** (e.g. ``kerberos-kdc``) - the runner prepends the release name
+ so the same overlay can be installed under any release. A legacy
+ ``RELEASE-NAME-foo`` form is still tolerated so older overlays keep
+ working after the schema change: the literal ``RELEASE-NAME-`` prefix
+ is stripped before the auto-prepend, leaving the suffix to be
+ re-prefixed with the actual release name.
+ """
+ if name.startswith("RELEASE-NAME-"):
+ name = name[len("RELEASE-NAME-") :]
+ return f"{release_name}-{name}"
+
+
+def _load_overlay_verify_block(overlay_dir: Path) -> dict[str, Any]:
+ status_path = overlay_dir / "STATUS.yaml"
+ if not status_path.exists():
+ console_print(f"[error]Overlay {overlay_dir.name} is missing
STATUS.yaml")
+ sys.exit(1)
+ status_doc = yaml.safe_load(status_path.read_text()) or {}
+ verify = status_doc.get("verify")
+ if not verify:
+ console_print(
+ f"[error]Overlay {overlay_dir.name} has no `verify:` block in
STATUS.yaml; "
+ "add one before running the smoke test (see
chart/kustomize-overlays/CONTRIBUTING.rst)."
+ )
+ sys.exit(1)
+ return verify
+
+
+# Container `state.waiting.reason` values that indicate the pod is not going
+# to recover without operator intervention (image is missing, image name is
+# malformed, container is in a tight crash loop). Treated as immediate
+# verify-block failure rather than waited out to the configured timeout.
+_TERMINAL_POD_WAITING_REASONS: frozenset[str] = frozenset(
+ {
+ "ImagePullBackOff",
+ "ErrImagePull",
+ "InvalidImageName",
+ "ImageInspectError",
+ "CrashLoopBackOff",
+ "CreateContainerConfigError",
+ "CreateContainerError",
+ }
+)
+
+
+def _has_terminal_pod_failure(
+ kind: str,
+ name: str,
+ namespace: str,
+ env: dict[str, str],
+ kubectl: str,
+) -> tuple[bool, str]:
+ """Return (failed, reason) for pods backing this verify resource.
+
+ Inspects waiting reasons on every regular and init container of every
+ pod selected by the resource's controller. The selector lookup mirrors
+ what the controller itself uses: ``spec.selector.matchLabels`` for
+ Deployment / StatefulSet / DaemonSet, the auto-applied
+ ``job-name=<name>`` label for Job. Resources without backing pods
+ (Service, Secret, ConfigMap, CRDs, …) are always treated as not-failed
+ here; their progress is observed via the success condition alone.
+ """
+ if kind not in ("Deployment", "StatefulSet", "DaemonSet", "Job"):
+ return False, ""
+ if kind == "Job":
+ selector = f"job-name={name}"
+ else:
+ sel = run_command(
+ [
+ kubectl,
+ "get",
+ kind.lower(),
+ name,
+ "-n",
+ namespace,
+ "-o",
+ "go-template={{range $k,$v :=
.spec.selector.matchLabels}}{{$k}}={{$v}},{{end}}",
+ ],
+ env=env,
+ check=False,
+ capture_output=True,
+ text=True,
+ )
+ if sel.returncode != 0 or not sel.stdout.strip():
+ return False, ""
+ selector = sel.stdout.strip().rstrip(",")
+ pods = run_command(
+ [
+ kubectl,
+ "get",
+ "pods",
+ "-n",
+ namespace,
+ "-l",
+ selector,
+ "-o",
+ (
+ "jsonpath={range .items[*]}"
+ "{range
.status.containerStatuses[*]}{.state.waiting.reason}|{end}"
+ "{range
.status.initContainerStatuses[*]}{.state.waiting.reason}|{end}"
+ "{end}"
+ ),
+ ],
+ env=env,
+ check=False,
+ capture_output=True,
+ text=True,
+ )
+ if pods.returncode != 0:
+ return False, ""
+ for reason in (r for r in pods.stdout.split("|") if r):
+ if reason in _TERMINAL_POD_WAITING_REASONS:
+ return True, reason
+ return False, ""
+
+
+def _wait_for_verify_resource(
+ resource: dict[str, Any],
+ namespace: str,
+ timeout_seconds: int,
+ env: dict[str, str],
+) -> int:
+ """Poll a verify-block resource to success or terminal failure.
+
+ Each iteration runs a one-second kubectl check for the success
+ condition and, in the same cycle, inspects backing pods (if any) for
+ terminal waiting reasons (ImagePullBackOff/CrashLoopBackOff/…). The
+ moment a terminal reason appears the loop aborts with a non-zero
+ return; otherwise it sleeps and retries up to the configured deadline.
+
+ Branch matrix:
+ * ``ready: true`` -> rollout status (Deployment/StatefulSet/DaemonSet)
+ * ``complete: true`` -> wait for condition=complete (Job)
+ * neither -> wait for the resource to be created (handles Secrets /
+ ConfigMaps / CRD children that an overlay's own Job or controller
+ materialises asynchronously; returns immediately for synchronously
+ applied resources)
+ """
+ kind = resource["kind"]
+ name = resource["name"]
+ kubectl = str(KUBECTL_BIN_PATH)
+ ns_args = ["-n", namespace]
+ target = f"{kind.lower()}/{name}"
+ if resource.get("ready") and kind in ("Deployment", "StatefulSet",
"DaemonSet"):
+ check_cmd = [kubectl, "rollout", "status", target, *ns_args,
"--timeout=1s"]
+ readiness = "ready"
+ elif resource.get("complete") and kind == "Job":
+ check_cmd = [kubectl, "wait", "--for=condition=complete", target,
*ns_args, "--timeout=1s"]
+ readiness = "complete"
+ else:
+ check_cmd = [kubectl, "wait", "--for=create", target, *ns_args,
"--timeout=1s"]
+ readiness = "created"
+ console_print(f"[info]verify: waiting for {kind}/{name} to be {readiness}
(timeout={timeout_seconds}s)")
+ deadline = time.monotonic() + timeout_seconds
+ poll_interval = 5
+ last_err = ""
+ while time.monotonic() < deadline:
+ success = run_command(check_cmd, env=env, check=False,
capture_output=True, text=True)
+ if success.returncode == 0:
+ console_print(f"[success]verify: {kind}/{name} is {readiness}")
+ return 0
+ last_err = (success.stderr or "").strip()
+ failed, reason = _has_terminal_pod_failure(kind, name, namespace, env,
kubectl)
+ if failed:
+ console_print(
+ f"[error]verify failed early for {kind}/{name}: backing pod in
{reason}; "
+ "not waiting out the full timeout."
+ )
+ run_command([kubectl, "describe", kind.lower(), name, *ns_args],
env=env, check=False)
+ run_command([kubectl, "get", "pods", *ns_args, "-o", "wide"],
env=env, check=False)
+ return 1
+ time.sleep(poll_interval)
+ console_print(
+ f"[error]verify timed out for {kind}/{name} after {timeout_seconds}s.
Last error: {last_err}"
+ )
+ run_command([kubectl, "describe", kind.lower(), name, *ns_args], env=env,
check=False)
+ return 1
+
+
+def _render_overlay(
+ overlay_dir: Path,
+ release_name: str,
+ namespace: str,
+ env: dict[str, str],
+) -> str | None:
+ kubectl = str(KUBECTL_BIN_PATH)
+ render = run_command(
+ [kubectl, "kustomize", str(overlay_dir)],
+ env=env,
+ check=False,
+ capture_output=True,
+ text=True,
+ )
+ if render.returncode != 0:
+ console_print(f"[error]kubectl kustomize failed:\n{render.stderr}")
+ return None
+ return _substitute_overlay_placeholders(render.stdout, release_name,
namespace)
+
+
+def _discover_overlay_images(manifest: str) -> list[str]:
+ """Extract every container image referenced by the rendered manifest.
+
+ Walks every loaded YAML doc and collects every ``image:`` string value,
+ regardless of nesting depth, so it picks up containers, initContainers,
+ sidecars under any pod-spec-bearing kind (Deployment, StatefulSet,
+ DaemonSet, Job, CronJob, Pod) without needing per-kind logic.
+ """
+ images: set[str] = set()
+
+ def _walk(node: Any) -> None:
+ if isinstance(node, dict):
+ for k, v in node.items():
+ if k == "image" and isinstance(v, str):
+ images.add(v)
+ else:
+ _walk(v)
+ elif isinstance(node, list):
+ for item in node:
+ _walk(item)
+
+ for doc in yaml.safe_load_all(manifest):
+ if doc:
+ _walk(doc)
+ return sorted(images)
+
+
+def _preload_overlay_images(
+ manifest: str,
+ python: str,
+ kubernetes_version: str,
+) -> int:
+ """Pre-pull every image the overlay references and ``kind load`` it.
+
+ Same pattern as ``_preload_test_images_to_kind`` but driven by what the
+ overlay actually declares, so it stays in sync as overlays evolve and
+ works for any overlay without a per-overlay images list. With
+ imagePullPolicy=IfNotPresent set on the overlay's pods (the convention),
+ kubelet never reaches out to a registry once these are loaded — so the
+ smoke test does not flake on Docker Hub rate limits or registry outages.
+ """
+ images = _discover_overlay_images(manifest)
+ if not images:
+ return 0
+ cluster_name = get_kind_cluster_name(python=python,
kubernetes_version=kubernetes_version)
+ console_print(
+ f"[info]Preloading {len(images)} overlay image(s) into kind cluster
{cluster_name}: {images}"
+ )
+ for image in images:
+ pull_rc = _docker_pull_with_429_retry(image, output=None)
+ if pull_rc != 0:
+ console_print(f"[error]docker pull {image} failed")
+ return pull_rc
+ kind_load = run_command_with_k8s_env(
+ ["kind", "load", "docker-image", "--name", cluster_name, image],
+ python=python,
+ kubernetes_version=kubernetes_version,
+ check=False,
+ )
+ if kind_load.returncode != 0:
+ console_print(f"[error]kind load docker-image {image} into
{cluster_name} failed")
+ return kind_load.returncode
+ return 0
+
+
+def _apply_or_delete_overlay(
+ action: Literal["apply", "delete"],
+ manifest: str,
+ namespace: str,
+ env: dict[str, str],
+) -> int:
+ kubectl = str(KUBECTL_BIN_PATH)
+ extra: list[str] = ["--ignore-not-found=true"] if action == "delete" else
[]
+ result = run_command(
+ [kubectl, action, "-n", namespace, *extra, "-f", "-"],
+ env=env,
+ check=False,
+ input=manifest,
+ text=True,
+ )
+ return result.returncode
+
+
+class _SequenceIndentingDumper(yaml.SafeDumper):
+ """yaml.SafeDumper variant that indents sequence items under their key.
+
+ PyYAML's default safe_dump output emits ``resources:\\n- kind: …`` which
+ yamllint (with ``indent-sequences: true``, the repo's default) rejects
+ with "expected 4 but found 2". Overriding ``increase_indent`` to pass
+ ``indentless=False`` produces ``resources:\\n - kind: …`` instead, which
+ matches the rest of the YAML in the repo and keeps the yamllint hook
+ green on the auto-promoted STATUS.yaml.
+ """
+
+ def increase_indent(self, flow: bool = False, indentless: bool = False) ->
None:
+ return super().increase_indent(flow, False)
+
+
+def _promote_overlay_status(overlay_dir: Path) -> int:
+ """Rewrite STATUS.yaml in-place to ``status: tested``.
+
+ Preserves everything above the YAML document separator ``---``
+ verbatim (license header + any explanatory comments). Re-emits the
+ document body with status fields refreshed and the existing
+ ``verify:`` block carried over.
+
+ Idempotent: if the overlay is already ``tested``, ``chart-version``
+ and ``last-verified`` are refreshed to current values. ``deprecated``
+ overlays are refused.
+ """
+ import datetime
+
+ status_path = overlay_dir / "STATUS.yaml"
+ original = status_path.read_text()
+ sep_idx = original.find("\n---")
+ if sep_idx >= 0:
+ header = original[:sep_idx] + "\n---\n"
+ body = original[sep_idx + len("\n---") :]
+ else:
+ header = ""
+ body = original
+ doc = yaml.safe_load(body) or {}
+ if doc.get("status") == "deprecated":
+ console_print(
+ f"[error]Refusing to promote
{status_path.relative_to(AIRFLOW_ROOT_PATH)}: "
+ "status is `deprecated`. Remove the deprecation first."
+ )
+ return 1
+ chart_meta = yaml.safe_load((CHART_PATH / "Chart.yaml").read_text())
+ promoted: dict[str, Any] = {
+ "status": "tested",
+ "chart-version": str(chart_meta["version"]),
+ "last-verified": datetime.date.today().isoformat(),
+ }
+ verify = doc.get("verify")
+ if verify:
+ promoted["verify"] = verify
+ rendered = yaml.dump(
+ promoted,
+ Dumper=_SequenceIndentingDumper,
+ sort_keys=False,
+ default_flow_style=False,
+ )
+ status_path.write_text(header + rendered)
+ console_print(
+ f"[success]Promoted {status_path.relative_to(AIRFLOW_ROOT_PATH)}: "
+ f"status=tested chart-version={promoted['chart-version']} "
+ f"last-verified={promoted['last-verified']}"
+ )
+ console_print(f"[info]Review with `git diff
{status_path.relative_to(AIRFLOW_ROOT_PATH)}` and commit.")
+ return 0
+
+
+def _run_overlay_pytest(
+ overlay_name: str,
+ namespace: str,
+ release_name: str,
+ python: str,
+ kubernetes_version: str,
+ executor: str,
+) -> int:
+ test_file = OVERLAY_TESTS_DIR / f"test_{overlay_name}.py"
+ if not test_file.exists():
+ console_print(
+ f"[info]No behavioural test module at
{test_file.relative_to(AIRFLOW_ROOT_PATH)} — "
+ "verify-block checks are the only assertions for this overlay."
+ )
+ return 0
+ env = get_k8s_env(python=python, kubernetes_version=kubernetes_version,
executor=executor)
+ env["OVERLAY_UNDER_TEST"] = overlay_name
+ env["OVERLAY_NAMESPACE"] = namespace
+ env["OVERLAY_RELEASE_NAME"] = release_name
+ pytest_cmd = ["uv", "run", "pytest",
str(test_file.relative_to(CHART_PATH)), "-xvs"]
+ console_print(f"[info]Running behavioural tests: {' '.join(pytest_cmd)}")
+ result = run_command(pytest_cmd, env=env, check=False,
cwd=CHART_PATH.as_posix())
+ return result.returncode
+
+
+def _is_ci() -> bool:
+ """Detect GitHub Actions / generic CI via the conventional CI=true env var.
+
+ Matches the existing pattern used elsewhere in breeze (e.g.
+ ``sync_virtualenv`` in kubernetes_utils.py), so the behaviour is
+ consistent: anything keyed off "are we in CI" reads the same signal.
+ """
Review Comment:
```suggestion
"""Detect GitHub Actions / generic CI."""
```
##########
dev/breeze/src/airflow_breeze/commands/kubernetes_commands.py:
##########
@@ -2450,3 +2463,642 @@ def deploy_cluster(
)
if return_code != 0:
sys.exit(return_code)
+
+
+# ---------------------------------------------------------------------------
+# `breeze k8s smoke-test-overlay` — functional smoke test for a single
+# kustomize overlay under chart/kustomize-overlays/.
+#
+# Counterpart to the structural `build_kustomize_overlays` prek hook:
+# the prek hook validates that an overlay builds and that its STATUS.yaml
+# parses, while this command applies the overlay to a running kind
+# cluster, waits for every resource declared in the STATUS.yaml `verify:`
+# block, and optionally runs a per-overlay pytest module for behavioural
+# checks. An overlay's STATUS may only advance to `tested` once this
+# command exits 0.
+# ---------------------------------------------------------------------------
+
+KUSTOMIZE_OVERLAYS_PATH = CHART_PATH / "kustomize-overlays"
+# Behavioural overlay tests live under chart/tests/overlay_tests/ (NOT
+# kubernetes-tests/) so they sit next to the overlay manifests and the
+# rest of chart-adjacent pytest content. They are NOT discovered by
+# `breeze testing helm-tests --test-type all` because `overlay_tests` is
+# in chart/pyproject.toml's norecursedirs — only this command (which
+# invokes pytest by explicit path) sees them.
+OVERLAY_TESTS_DIR = CHART_PATH / "tests" / "overlay_tests"
+
+
+def _substitute_overlay_placeholders(rendered: str, release_name: str,
namespace: str) -> str:
+ return rendered.replace("RELEASE-NAME", release_name).replace("NAMESPACE",
namespace)
+
+
+def _resolve_verify_resource_name(name: str, release_name: str) -> str:
+ """Auto-prepend ``<release_name>-`` to a verify-block resource name.
+
+ STATUS.yaml's ``verify:`` entries name resources by their **suffix
+ only** (e.g. ``kerberos-kdc``) - the runner prepends the release name
+ so the same overlay can be installed under any release. A legacy
+ ``RELEASE-NAME-foo`` form is still tolerated so older overlays keep
+ working after the schema change: the literal ``RELEASE-NAME-`` prefix
+ is stripped before the auto-prepend, leaving the suffix to be
+ re-prefixed with the actual release name.
+ """
+ if name.startswith("RELEASE-NAME-"):
+ name = name[len("RELEASE-NAME-") :]
+ return f"{release_name}-{name}"
+
+
+def _load_overlay_verify_block(overlay_dir: Path) -> dict[str, Any]:
+ status_path = overlay_dir / "STATUS.yaml"
+ if not status_path.exists():
+ console_print(f"[error]Overlay {overlay_dir.name} is missing
STATUS.yaml")
+ sys.exit(1)
+ status_doc = yaml.safe_load(status_path.read_text()) or {}
+ verify = status_doc.get("verify")
+ if not verify:
+ console_print(
+ f"[error]Overlay {overlay_dir.name} has no `verify:` block in
STATUS.yaml; "
+ "add one before running the smoke test (see
chart/kustomize-overlays/CONTRIBUTING.rst)."
+ )
+ sys.exit(1)
+ return verify
+
+
+# Container `state.waiting.reason` values that indicate the pod is not going
+# to recover without operator intervention (image is missing, image name is
+# malformed, container is in a tight crash loop). Treated as immediate
+# verify-block failure rather than waited out to the configured timeout.
+_TERMINAL_POD_WAITING_REASONS: frozenset[str] = frozenset(
+ {
+ "ImagePullBackOff",
+ "ErrImagePull",
+ "InvalidImageName",
+ "ImageInspectError",
+ "CrashLoopBackOff",
+ "CreateContainerConfigError",
+ "CreateContainerError",
+ }
+)
+
+
+def _has_terminal_pod_failure(
+ kind: str,
+ name: str,
+ namespace: str,
+ env: dict[str, str],
+ kubectl: str,
+) -> tuple[bool, str]:
+ """Return (failed, reason) for pods backing this verify resource.
+
+ Inspects waiting reasons on every regular and init container of every
+ pod selected by the resource's controller. The selector lookup mirrors
+ what the controller itself uses: ``spec.selector.matchLabels`` for
+ Deployment / StatefulSet / DaemonSet, the auto-applied
+ ``job-name=<name>`` label for Job. Resources without backing pods
+ (Service, Secret, ConfigMap, CRDs, …) are always treated as not-failed
+ here; their progress is observed via the success condition alone.
+ """
+ if kind not in ("Deployment", "StatefulSet", "DaemonSet", "Job"):
+ return False, ""
+ if kind == "Job":
+ selector = f"job-name={name}"
+ else:
+ sel = run_command(
+ [
+ kubectl,
+ "get",
+ kind.lower(),
+ name,
+ "-n",
+ namespace,
+ "-o",
+ "go-template={{range $k,$v :=
.spec.selector.matchLabels}}{{$k}}={{$v}},{{end}}",
+ ],
+ env=env,
+ check=False,
+ capture_output=True,
+ text=True,
+ )
+ if sel.returncode != 0 or not sel.stdout.strip():
+ return False, ""
+ selector = sel.stdout.strip().rstrip(",")
+ pods = run_command(
+ [
+ kubectl,
+ "get",
+ "pods",
+ "-n",
+ namespace,
+ "-l",
+ selector,
+ "-o",
+ (
+ "jsonpath={range .items[*]}"
+ "{range
.status.containerStatuses[*]}{.state.waiting.reason}|{end}"
+ "{range
.status.initContainerStatuses[*]}{.state.waiting.reason}|{end}"
+ "{end}"
+ ),
+ ],
+ env=env,
+ check=False,
+ capture_output=True,
+ text=True,
+ )
+ if pods.returncode != 0:
+ return False, ""
+ for reason in (r for r in pods.stdout.split("|") if r):
+ if reason in _TERMINAL_POD_WAITING_REASONS:
+ return True, reason
+ return False, ""
+
+
+def _wait_for_verify_resource(
+ resource: dict[str, Any],
+ namespace: str,
+ timeout_seconds: int,
+ env: dict[str, str],
+) -> int:
+ """Poll a verify-block resource to success or terminal failure.
+
+ Each iteration runs a one-second kubectl check for the success
+ condition and, in the same cycle, inspects backing pods (if any) for
+ terminal waiting reasons (ImagePullBackOff/CrashLoopBackOff/…). The
+ moment a terminal reason appears the loop aborts with a non-zero
+ return; otherwise it sleeps and retries up to the configured deadline.
+
+ Branch matrix:
+ * ``ready: true`` -> rollout status (Deployment/StatefulSet/DaemonSet)
+ * ``complete: true`` -> wait for condition=complete (Job)
+ * neither -> wait for the resource to be created (handles Secrets /
+ ConfigMaps / CRD children that an overlay's own Job or controller
+ materialises asynchronously; returns immediately for synchronously
+ applied resources)
+ """
+ kind = resource["kind"]
+ name = resource["name"]
+ kubectl = str(KUBECTL_BIN_PATH)
+ ns_args = ["-n", namespace]
+ target = f"{kind.lower()}/{name}"
+ if resource.get("ready") and kind in ("Deployment", "StatefulSet",
"DaemonSet"):
+ check_cmd = [kubectl, "rollout", "status", target, *ns_args,
"--timeout=1s"]
+ readiness = "ready"
+ elif resource.get("complete") and kind == "Job":
+ check_cmd = [kubectl, "wait", "--for=condition=complete", target,
*ns_args, "--timeout=1s"]
+ readiness = "complete"
+ else:
+ check_cmd = [kubectl, "wait", "--for=create", target, *ns_args,
"--timeout=1s"]
+ readiness = "created"
+ console_print(f"[info]verify: waiting for {kind}/{name} to be {readiness}
(timeout={timeout_seconds}s)")
+ deadline = time.monotonic() + timeout_seconds
+ poll_interval = 5
+ last_err = ""
+ while time.monotonic() < deadline:
+ success = run_command(check_cmd, env=env, check=False,
capture_output=True, text=True)
+ if success.returncode == 0:
+ console_print(f"[success]verify: {kind}/{name} is {readiness}")
+ return 0
+ last_err = (success.stderr or "").strip()
+ failed, reason = _has_terminal_pod_failure(kind, name, namespace, env,
kubectl)
+ if failed:
+ console_print(
+ f"[error]verify failed early for {kind}/{name}: backing pod in
{reason}; "
+ "not waiting out the full timeout."
+ )
+ run_command([kubectl, "describe", kind.lower(), name, *ns_args],
env=env, check=False)
+ run_command([kubectl, "get", "pods", *ns_args, "-o", "wide"],
env=env, check=False)
+ return 1
+ time.sleep(poll_interval)
+ console_print(
+ f"[error]verify timed out for {kind}/{name} after {timeout_seconds}s.
Last error: {last_err}"
+ )
+ run_command([kubectl, "describe", kind.lower(), name, *ns_args], env=env,
check=False)
+ return 1
+
+
+def _render_overlay(
+ overlay_dir: Path,
+ release_name: str,
+ namespace: str,
+ env: dict[str, str],
+) -> str | None:
+ kubectl = str(KUBECTL_BIN_PATH)
+ render = run_command(
+ [kubectl, "kustomize", str(overlay_dir)],
+ env=env,
+ check=False,
+ capture_output=True,
+ text=True,
+ )
+ if render.returncode != 0:
+ console_print(f"[error]kubectl kustomize failed:\n{render.stderr}")
+ return None
+ return _substitute_overlay_placeholders(render.stdout, release_name,
namespace)
+
+
+def _discover_overlay_images(manifest: str) -> list[str]:
+ """Extract every container image referenced by the rendered manifest.
+
+ Walks every loaded YAML doc and collects every ``image:`` string value,
+ regardless of nesting depth, so it picks up containers, initContainers,
+ sidecars under any pod-spec-bearing kind (Deployment, StatefulSet,
+ DaemonSet, Job, CronJob, Pod) without needing per-kind logic.
+ """
+ images: set[str] = set()
+
+ def _walk(node: Any) -> None:
+ if isinstance(node, dict):
+ for k, v in node.items():
+ if k == "image" and isinstance(v, str):
+ images.add(v)
+ else:
+ _walk(v)
+ elif isinstance(node, list):
+ for item in node:
+ _walk(item)
+
+ for doc in yaml.safe_load_all(manifest):
+ if doc:
+ _walk(doc)
+ return sorted(images)
+
+
+def _preload_overlay_images(
+ manifest: str,
+ python: str,
+ kubernetes_version: str,
+) -> int:
+ """Pre-pull every image the overlay references and ``kind load`` it.
+
+ Same pattern as ``_preload_test_images_to_kind`` but driven by what the
+ overlay actually declares, so it stays in sync as overlays evolve and
+ works for any overlay without a per-overlay images list. With
+ imagePullPolicy=IfNotPresent set on the overlay's pods (the convention),
+ kubelet never reaches out to a registry once these are loaded — so the
+ smoke test does not flake on Docker Hub rate limits or registry outages.
+ """
+ images = _discover_overlay_images(manifest)
+ if not images:
+ return 0
+ cluster_name = get_kind_cluster_name(python=python,
kubernetes_version=kubernetes_version)
+ console_print(
+ f"[info]Preloading {len(images)} overlay image(s) into kind cluster
{cluster_name}: {images}"
+ )
+ for image in images:
+ pull_rc = _docker_pull_with_429_retry(image, output=None)
+ if pull_rc != 0:
+ console_print(f"[error]docker pull {image} failed")
+ return pull_rc
+ kind_load = run_command_with_k8s_env(
+ ["kind", "load", "docker-image", "--name", cluster_name, image],
+ python=python,
+ kubernetes_version=kubernetes_version,
+ check=False,
+ )
+ if kind_load.returncode != 0:
+ console_print(f"[error]kind load docker-image {image} into
{cluster_name} failed")
+ return kind_load.returncode
+ return 0
+
+
+def _apply_or_delete_overlay(
+ action: Literal["apply", "delete"],
+ manifest: str,
+ namespace: str,
+ env: dict[str, str],
+) -> int:
+ kubectl = str(KUBECTL_BIN_PATH)
+ extra: list[str] = ["--ignore-not-found=true"] if action == "delete" else
[]
+ result = run_command(
+ [kubectl, action, "-n", namespace, *extra, "-f", "-"],
+ env=env,
+ check=False,
+ input=manifest,
+ text=True,
+ )
+ return result.returncode
+
+
+class _SequenceIndentingDumper(yaml.SafeDumper):
+ """yaml.SafeDumper variant that indents sequence items under their key.
+
+ PyYAML's default safe_dump output emits ``resources:\\n- kind: …`` which
+ yamllint (with ``indent-sequences: true``, the repo's default) rejects
+ with "expected 4 but found 2". Overriding ``increase_indent`` to pass
+ ``indentless=False`` produces ``resources:\\n - kind: …`` instead, which
+ matches the rest of the YAML in the repo and keeps the yamllint hook
+ green on the auto-promoted STATUS.yaml.
+ """
+
+ def increase_indent(self, flow: bool = False, indentless: bool = False) ->
None:
+ return super().increase_indent(flow, False)
+
+
+def _promote_overlay_status(overlay_dir: Path) -> int:
+ """Rewrite STATUS.yaml in-place to ``status: tested``.
+
+ Preserves everything above the YAML document separator ``---``
+ verbatim (license header + any explanatory comments). Re-emits the
+ document body with status fields refreshed and the existing
+ ``verify:`` block carried over.
Review Comment:
```suggestion
Preserves everything above the YAML document separator ``---``.
Re-emits the document body with status fields refreshed and the
existing ``verify:`` block carried over.
```
##########
chart/kustomize-overlays/kerberos/README.rst:
##########
@@ -0,0 +1,196 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+Kerberos Test KDC Overlay
+=========================
+
+This overlay stands up a throwaway in-cluster MIT Kerberos KDC, creates
+the ``airflow/airflow.<namespace>.svc.cluster.local`` service principal,
+and stores its keytab in a Secret named ``<release>-kerberos-keytab``.
+It is a standalone addition; no resource produced by the Helm chart is
+modified.
+
+It is intended as a proof-of-concept of how a non-Airflow component
+(in this case Kerberos infrastructure) can be expressed as a Kustomize
+overlay alongside the chart, rather than baked into the chart itself.
+The keytab Secret it produces is consumable as-is by the chart's
+existing kerberos sidecar (``kerberos.enabled=true``,
+``kerberos.keytab=/etc/airflow.keytab``,
+``extraSecrets.<release>-kerberos-keytab: {}``).
+
+.. warning::
+
+ The KDC pod uses a fixed admin password and stores its database in
+ an ``emptyDir``. Do not connect production workloads to it. Treat it
+ as a test fixture only.
+
+Prerequisites
+-------------
+
+* The Airflow chart installed in the same namespace (any executor).
+* ``kubectl`` access sufficient to apply Deployments, Services,
+ ConfigMaps, Secrets, ServiceAccounts/Roles/RoleBindings, and Jobs in
+ that namespace.
+
+Resources produced
+------------------
+
+* ``ConfigMap/<release>-krb5-conf`` - ``krb5.conf`` with the test realm
+ ``EXAMPLE.COM`` and the in-cluster KDC service as ``kdc``/``admin_server``.
+* ``Deployment/<release>-kerberos-kdc`` - single-replica MIT Kerberos
+ KDC + kadmind, image ``gcavalcante8808/krb5-server:latest``.
+* ``Service/<release>-kerberos-kdc`` - exposes 88 TCP+UDP and 749 TCP.
+* ``ServiceAccount`` + ``Role`` + ``RoleBinding`` named
+ ``<release>-kerberos-bootstrap`` - minimum permissions for the
+ bootstrap Job (pod exec + secret create/update in the same namespace).
+* ``Job/<release>-keytab-bootstrap`` - waits for the KDC to be Ready,
+ runs ``kadmin.local`` against it to create the principal and write
+ the keytab, then stores the keytab in:
+* ``Secret/<release>-kerberos-keytab`` (created by the Job) - holds
+ ``airflow.keytab`` under that key.
+
+Usage
+-----
+
+Reference this overlay from your own kustomization and substitute the
+release name and namespace. A minimal example:
+
+.. code-block:: yaml
+
+ # my-overlay/kustomization.yaml
+ apiVersion: kustomize.config.k8s.io/v1beta1
+ kind: Kustomization
+ namespace: airflow
+
+ resources:
+ -
github.com/apache/airflow/chart/kustomize-overlays/kerberos?ref=helm-chart/1.22.0
+
+Apply with:
+
+.. code-block:: bash
+
+ kubectl apply -k my-overlay/
+
+For a quick test, you can also just substitute the placeholders inline:
+
+.. code-block:: bash
+
+ kubectl kustomize chart/kustomize-overlays/kerberos | \
+ sed -e 's/RELEASE-NAME/airflow/g' -e 's/NAMESPACE/airflow/g' | \
+ kubectl apply -n airflow -f -
+
+This is exactly what ``breeze k8s smoke-test-overlay kerberos`` does
+during the local and CI smoke test.
+
+Wiring the keytab into the chart's sidecar
+------------------------------------------
+
+The chart's kerberos sidecar (``workers.kerberosSidecar``,
+``workers.celery.kerberosSidecar``) mounts a Secret named in
+``kerberos.keytab``. Point that at the Secret produced by this overlay:
+
+.. code-block:: yaml
+
+ # values.yaml fragment
+ kerberos:
+ enabled: true
+ ccacheMountPath: /var/kerberos-ccache
+ keytabPath: /etc/airflow.keytab
+ principal: airflow/[email protected]
+
+ workers:
+ kerberosSidecar:
+ enabled: true
+
+ extraSecrets:
+ airflow-kerberos-keytab: {} # exists from this overlay
+
+Migration guide from the chart
+------------------------------
+
+What the chart currently does
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When ``kerberos.enabled=true`` and ``kerberos.keytabBase64Content`` is
+provided, the chart renders a ``Secret`` carrying the user-supplied
+keytab and a ``ConfigMap`` with the user-supplied ``krb5.conf``. The
+user is expected to bring their own KDC.
+
+What this overlay provides
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+* A working test KDC in the same namespace, so a developer can exercise
+ the chart's kerberos sidecar end-to-end without standing up an
+ external Kerberos service.
+* A bootstrap Job that materialises the keytab Secret automatically,
+ so no base64-encoded blob ends up in ``values.yaml`` or a developer's
+ shell history.
+
+How to switch
+^^^^^^^^^^^^^
+
+1. Install or upgrade the chart with ``kerberos.enabled`` set as you
+ want, but **without** ``kerberos.keytabBase64Content``.
+2. Apply this overlay against the same namespace.
+3. Wait for ``Job/<release>-keytab-bootstrap`` to complete.
+4. Confirm the Secret exists and reference it from the chart's sidecar
+ config as shown above.
+
+Status
+------
+
+This overlay is ``tested``: the ``verify:`` block in ``STATUS.yaml``
+is the smoke-test contract (KDC Deployment Ready, Service exists,
+bootstrap Job Complete, keytab Secret exists), and the
+``test_kerberos.py`` module under
+``chart/tests/overlay_tests/`` adds the
+behavioural assertion: a throwaway client pod ``kinit``\ ing against
+the in-cluster KDC and confirming the principal in ``klist`` output.
+``last-verified`` in ``STATUS.yaml`` records the most recent green
+local run; re-run the smoke test with ``--promote-status`` to refresh
+it whenever you re-verify against your cluster.
+
+To run the smoke test locally:
+
+.. code-block:: bash
+
+ breeze k8s deploy-cluster --rebuild-base-image
+ breeze k8s deploy-airflow
+ breeze k8s smoke-test-overlay kerberos --promote-status
+
+The ``--promote-status`` flag rewrites this overlay's ``STATUS.yaml``
+in place on a green run (chart-version from ``chart/Chart.yaml`` plus
+today's date). Without it the smoke test still runs and verifies, it
+just leaves ``STATUS.yaml`` untouched.
+
+The ``build-k8s-image`` + ``upload-k8s-image`` pair is required locally
+because the chart's default image lives on ghcr.io behind CI auth;
+without those steps ``deploy-airflow`` will fail with ImagePullBackOff
+(HTTP 403). CI itself runs ``breeze k8s run-complete-tests`` which
+chains all of the above.
+
Review Comment:
```suggestion
```
Can be removed after using `deploy-cluster` command.
##########
dev/breeze/doc/05_test_commands.rst:
##########
@@ -650,6 +650,61 @@ output during test execution.
breeze k8s tests -- test_kubernetes_executor.py -s
+Smoke-testing a kustomize overlay
+.................................
+
+You can run ``breeze k8s smoke-test-overlay <name>`` to apply one of the
+overlays in ``chart/kustomize-overlays/`` to the current KinD cluster,
+wait for every resource declared in that overlay's ``STATUS.yaml``
+``verify:`` block, and run the optional per-overlay pytest module under
+``chart/tests/overlay_tests/``. It is the
+functional counterpart of the structural ``build_kustomize_overlays``
+prek hook; an overlay's ``STATUS`` may only advance to ``tested`` once
+this command exits 0.
+
+The runner is overlay-agnostic. For every overlay it:
+
+* renders the overlay and substitutes ``RELEASE-NAME`` / ``NAMESPACE``,
+* **auto-preloads every ``image:`` referenced by the rendered manifest**
+ into the kind nodes via ``docker pull`` (with retry on Docker Hub
+ rate limits) + ``kind load docker-image``, so the test does not flake
+ on registry availability,
+* applies the overlay,
+* polls each ``verify:`` resource for its declared success state while
+ **failing fast on terminal pod waiting reasons**
+ (``ImagePullBackOff``, ``ErrImagePull``, ``CrashLoopBackOff``,
+ ``CreateContainerConfigError``, …) rather than waiting out the full
+ ``timeout_seconds``,
+* runs the optional per-overlay pytest module,
+* deletes the overlay (skip with ``--skip-cleanup``).
+
+See ``chart/kustomize-overlays/CONTRIBUTING.rst`` for the full
+lifecycle and how an overlay's ``STATUS`` advances from ``not-tested``
+to ``tested``.
+
+.. code-block:: bash
+
+ breeze k8s setup-env
+ breeze k8s create-cluster
+ breeze k8s configure-cluster
+ breeze k8s build-k8s-image --rebuild-base-image # first time only
+ breeze k8s upload-k8s-image
+ breeze k8s deploy-airflow
+ breeze k8s smoke-test-overlay kerberos
+
+The ``build-k8s-image`` + ``upload-k8s-image`` pair is required locally
+because the chart's default image is the CI-private
+``ghcr.io/apache/airflow/main/prod/python<X>-kubernetes:latest``;
+without those steps ``deploy-airflow`` will fail with ImagePullBackOff
+(HTTP 403).
+
Review Comment:
```suggestion
.. code-block:: bash
breeze k8s deploy-cluster --rebuild-base-image
breeze k8s deploy-airflow
breeze k8s smoke-test-overlay kerberos
.. note::
``--rebuild-base-image`` flag is only required during the first run of
the command.
```
##########
chart/kustomize-overlays/kerberos/README.rst:
##########
@@ -0,0 +1,196 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+Kerberos Test KDC Overlay
+=========================
+
+This overlay stands up a throwaway in-cluster MIT Kerberos KDC, creates
+the ``airflow/airflow.<namespace>.svc.cluster.local`` service principal,
+and stores its keytab in a Secret named ``<release>-kerberos-keytab``.
+It is a standalone addition; no resource produced by the Helm chart is
+modified.
+
+It is intended as a proof-of-concept of how a non-Airflow component
+(in this case Kerberos infrastructure) can be expressed as a Kustomize
+overlay alongside the chart, rather than baked into the chart itself.
+The keytab Secret it produces is consumable as-is by the chart's
+existing kerberos sidecar (``kerberos.enabled=true``,
+``kerberos.keytab=/etc/airflow.keytab``,
+``extraSecrets.<release>-kerberos-keytab: {}``).
+
+.. warning::
+
+ The KDC pod uses a fixed admin password and stores its database in
+ an ``emptyDir``. Do not connect production workloads to it. Treat it
+ as a test fixture only.
+
+Prerequisites
+-------------
+
+* The Airflow chart installed in the same namespace (any executor).
+* ``kubectl`` access sufficient to apply Deployments, Services,
+ ConfigMaps, Secrets, ServiceAccounts/Roles/RoleBindings, and Jobs in
+ that namespace.
+
+Resources produced
+------------------
+
+* ``ConfigMap/<release>-krb5-conf`` - ``krb5.conf`` with the test realm
+ ``EXAMPLE.COM`` and the in-cluster KDC service as ``kdc``/``admin_server``.
+* ``Deployment/<release>-kerberos-kdc`` - single-replica MIT Kerberos
+ KDC + kadmind, image ``gcavalcante8808/krb5-server:latest``.
+* ``Service/<release>-kerberos-kdc`` - exposes 88 TCP+UDP and 749 TCP.
+* ``ServiceAccount`` + ``Role`` + ``RoleBinding`` named
+ ``<release>-kerberos-bootstrap`` - minimum permissions for the
+ bootstrap Job (pod exec + secret create/update in the same namespace).
+* ``Job/<release>-keytab-bootstrap`` - waits for the KDC to be Ready,
+ runs ``kadmin.local`` against it to create the principal and write
+ the keytab, then stores the keytab in:
+* ``Secret/<release>-kerberos-keytab`` (created by the Job) - holds
+ ``airflow.keytab`` under that key.
+
+Usage
+-----
+
+Reference this overlay from your own kustomization and substitute the
+release name and namespace. A minimal example:
+
+.. code-block:: yaml
+
+ # my-overlay/kustomization.yaml
+ apiVersion: kustomize.config.k8s.io/v1beta1
+ kind: Kustomization
+ namespace: airflow
+
+ resources:
+ -
github.com/apache/airflow/chart/kustomize-overlays/kerberos?ref=helm-chart/1.22.0
+
+Apply with:
+
+.. code-block:: bash
+
+ kubectl apply -k my-overlay/
+
+For a quick test, you can also just substitute the placeholders inline:
+
+.. code-block:: bash
+
+ kubectl kustomize chart/kustomize-overlays/kerberos | \
+ sed -e 's/RELEASE-NAME/airflow/g' -e 's/NAMESPACE/airflow/g' | \
+ kubectl apply -n airflow -f -
+
+This is exactly what ``breeze k8s smoke-test-overlay kerberos`` does
+during the local and CI smoke test.
+
+Wiring the keytab into the chart's sidecar
+------------------------------------------
+
+The chart's kerberos sidecar (``workers.kerberosSidecar``,
+``workers.celery.kerberosSidecar``) mounts a Secret named in
+``kerberos.keytab``. Point that at the Secret produced by this overlay:
+
+.. code-block:: yaml
+
+ # values.yaml fragment
+ kerberos:
+ enabled: true
+ ccacheMountPath: /var/kerberos-ccache
+ keytabPath: /etc/airflow.keytab
+ principal: airflow/[email protected]
+
+ workers:
+ kerberosSidecar:
+ enabled: true
+
+ extraSecrets:
+ airflow-kerberos-keytab: {} # exists from this overlay
+
+Migration guide from the chart
+------------------------------
+
+What the chart currently does
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When ``kerberos.enabled=true`` and ``kerberos.keytabBase64Content`` is
+provided, the chart renders a ``Secret`` carrying the user-supplied
+keytab and a ``ConfigMap`` with the user-supplied ``krb5.conf``. The
+user is expected to bring their own KDC.
+
+What this overlay provides
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+* A working test KDC in the same namespace, so a developer can exercise
+ the chart's kerberos sidecar end-to-end without standing up an
+ external Kerberos service.
+* A bootstrap Job that materialises the keytab Secret automatically,
+ so no base64-encoded blob ends up in ``values.yaml`` or a developer's
+ shell history.
+
+How to switch
+^^^^^^^^^^^^^
+
+1. Install or upgrade the chart with ``kerberos.enabled`` set as you
+ want, but **without** ``kerberos.keytabBase64Content``.
+2. Apply this overlay against the same namespace.
+3. Wait for ``Job/<release>-keytab-bootstrap`` to complete.
+4. Confirm the Secret exists and reference it from the chart's sidecar
+ config as shown above.
+
+Status
+------
+
+This overlay is ``tested``: the ``verify:`` block in ``STATUS.yaml``
+is the smoke-test contract (KDC Deployment Ready, Service exists,
+bootstrap Job Complete, keytab Secret exists), and the
+``test_kerberos.py`` module under
+``chart/tests/overlay_tests/`` adds the
+behavioural assertion: a throwaway client pod ``kinit``\ ing against
+the in-cluster KDC and confirming the principal in ``klist`` output.
+``last-verified`` in ``STATUS.yaml`` records the most recent green
+local run; re-run the smoke test with ``--promote-status`` to refresh
+it whenever you re-verify against your cluster.
Review Comment:
I think that this paragraph is just a summary of the content at the top of
this file and the STATUS.yaml file. Maybe it would be enough to just point to
the STATUS.yaml here 🤔?
##########
chart/kustomize-overlays/kerberos/README.rst:
##########
@@ -0,0 +1,196 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+Kerberos Test KDC Overlay
+=========================
+
+This overlay stands up a throwaway in-cluster MIT Kerberos KDC, creates
+the ``airflow/airflow.<namespace>.svc.cluster.local`` service principal,
+and stores its keytab in a Secret named ``<release>-kerberos-keytab``.
+It is a standalone addition; no resource produced by the Helm chart is
+modified.
+
+It is intended as a proof-of-concept of how a non-Airflow component
+(in this case Kerberos infrastructure) can be expressed as a Kustomize
+overlay alongside the chart, rather than baked into the chart itself.
+The keytab Secret it produces is consumable as-is by the chart's
+existing kerberos sidecar (``kerberos.enabled=true``,
+``kerberos.keytab=/etc/airflow.keytab``,
+``extraSecrets.<release>-kerberos-keytab: {}``).
+
+.. warning::
+
+ The KDC pod uses a fixed admin password and stores its database in
+ an ``emptyDir``. Do not connect production workloads to it. Treat it
+ as a test fixture only.
+
+Prerequisites
+-------------
+
+* The Airflow chart installed in the same namespace (any executor).
+* ``kubectl`` access sufficient to apply Deployments, Services,
+ ConfigMaps, Secrets, ServiceAccounts/Roles/RoleBindings, and Jobs in
+ that namespace.
+
+Resources produced
+------------------
+
+* ``ConfigMap/<release>-krb5-conf`` - ``krb5.conf`` with the test realm
+ ``EXAMPLE.COM`` and the in-cluster KDC service as ``kdc``/``admin_server``.
+* ``Deployment/<release>-kerberos-kdc`` - single-replica MIT Kerberos
+ KDC + kadmind, image ``gcavalcante8808/krb5-server:latest``.
+* ``Service/<release>-kerberos-kdc`` - exposes 88 TCP+UDP and 749 TCP.
+* ``ServiceAccount`` + ``Role`` + ``RoleBinding`` named
+ ``<release>-kerberos-bootstrap`` - minimum permissions for the
+ bootstrap Job (pod exec + secret create/update in the same namespace).
+* ``Job/<release>-keytab-bootstrap`` - waits for the KDC to be Ready,
+ runs ``kadmin.local`` against it to create the principal and write
+ the keytab, then stores the keytab in:
+* ``Secret/<release>-kerberos-keytab`` (created by the Job) - holds
+ ``airflow.keytab`` under that key.
+
+Usage
+-----
+
+Reference this overlay from your own kustomization and substitute the
+release name and namespace. A minimal example:
+
+.. code-block:: yaml
+
+ # my-overlay/kustomization.yaml
+ apiVersion: kustomize.config.k8s.io/v1beta1
+ kind: Kustomization
+ namespace: airflow
+
+ resources:
+ -
github.com/apache/airflow/chart/kustomize-overlays/kerberos?ref=helm-chart/1.22.0
+
+Apply with:
+
+.. code-block:: bash
+
+ kubectl apply -k my-overlay/
+
+For a quick test, you can also just substitute the placeholders inline:
+
+.. code-block:: bash
+
+ kubectl kustomize chart/kustomize-overlays/kerberos | \
+ sed -e 's/RELEASE-NAME/airflow/g' -e 's/NAMESPACE/airflow/g' | \
+ kubectl apply -n airflow -f -
+
+This is exactly what ``breeze k8s smoke-test-overlay kerberos`` does
+during the local and CI smoke test.
+
+Wiring the keytab into the chart's sidecar
+------------------------------------------
+
+The chart's kerberos sidecar (``workers.kerberosSidecar``,
+``workers.celery.kerberosSidecar``) mounts a Secret named in
+``kerberos.keytab``. Point that at the Secret produced by this overlay:
Review Comment:
```suggestion
The chart's kerberos sidecar (``workers.celery.kerberosInitContainer`,
``workers.celery.kerberosSidecar``,
``workers.kubernetes.kerberosInitContainer`,
``workers.kubernetes.kerberosSidecar``) mounts a Secret named in
``kerberos.keytab``. Point that at the Secret produced by this overlay:
```
##########
chart/kustomize-overlays/kerberos/README.rst:
##########
@@ -0,0 +1,196 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+Kerberos Test KDC Overlay
+=========================
+
+This overlay stands up a throwaway in-cluster MIT Kerberos KDC, creates
+the ``airflow/airflow.<namespace>.svc.cluster.local`` service principal,
+and stores its keytab in a Secret named ``<release>-kerberos-keytab``.
+It is a standalone addition; no resource produced by the Helm chart is
+modified.
+
+It is intended as a proof-of-concept of how a non-Airflow component
+(in this case Kerberos infrastructure) can be expressed as a Kustomize
+overlay alongside the chart, rather than baked into the chart itself.
+The keytab Secret it produces is consumable as-is by the chart's
+existing kerberos sidecar (``kerberos.enabled=true``,
+``kerberos.keytab=/etc/airflow.keytab``,
+``extraSecrets.<release>-kerberos-keytab: {}``).
+
+.. warning::
+
+ The KDC pod uses a fixed admin password and stores its database in
+ an ``emptyDir``. Do not connect production workloads to it. Treat it
+ as a test fixture only.
+
+Prerequisites
+-------------
+
+* The Airflow chart installed in the same namespace (any executor).
+* ``kubectl`` access sufficient to apply Deployments, Services,
+ ConfigMaps, Secrets, ServiceAccounts/Roles/RoleBindings, and Jobs in
+ that namespace.
+
+Resources produced
+------------------
+
+* ``ConfigMap/<release>-krb5-conf`` - ``krb5.conf`` with the test realm
+ ``EXAMPLE.COM`` and the in-cluster KDC service as ``kdc``/``admin_server``.
+* ``Deployment/<release>-kerberos-kdc`` - single-replica MIT Kerberos
+ KDC + kadmind, image ``gcavalcante8808/krb5-server:latest``.
+* ``Service/<release>-kerberos-kdc`` - exposes 88 TCP+UDP and 749 TCP.
+* ``ServiceAccount`` + ``Role`` + ``RoleBinding`` named
+ ``<release>-kerberos-bootstrap`` - minimum permissions for the
+ bootstrap Job (pod exec + secret create/update in the same namespace).
+* ``Job/<release>-keytab-bootstrap`` - waits for the KDC to be Ready,
+ runs ``kadmin.local`` against it to create the principal and write
+ the keytab, then stores the keytab in:
+* ``Secret/<release>-kerberos-keytab`` (created by the Job) - holds
+ ``airflow.keytab`` under that key.
+
+Usage
+-----
+
+Reference this overlay from your own kustomization and substitute the
+release name and namespace. A minimal example:
+
+.. code-block:: yaml
+
+ # my-overlay/kustomization.yaml
+ apiVersion: kustomize.config.k8s.io/v1beta1
+ kind: Kustomization
+ namespace: airflow
+
+ resources:
+ -
github.com/apache/airflow/chart/kustomize-overlays/kerberos?ref=helm-chart/1.22.0
+
+Apply with:
+
+.. code-block:: bash
+
+ kubectl apply -k my-overlay/
+
+For a quick test, you can also just substitute the placeholders inline:
+
+.. code-block:: bash
+
+ kubectl kustomize chart/kustomize-overlays/kerberos | \
+ sed -e 's/RELEASE-NAME/airflow/g' -e 's/NAMESPACE/airflow/g' | \
+ kubectl apply -n airflow -f -
+
+This is exactly what ``breeze k8s smoke-test-overlay kerberos`` does
+during the local and CI smoke test.
+
+Wiring the keytab into the chart's sidecar
+------------------------------------------
+
+The chart's kerberos sidecar (``workers.kerberosSidecar``,
+``workers.celery.kerberosSidecar``) mounts a Secret named in
+``kerberos.keytab``. Point that at the Secret produced by this overlay:
+
+.. code-block:: yaml
+
+ # values.yaml fragment
+ kerberos:
+ enabled: true
+ ccacheMountPath: /var/kerberos-ccache
+ keytabPath: /etc/airflow.keytab
+ principal: airflow/[email protected]
+
+ workers:
+ kerberosSidecar:
+ enabled: true
+
+ extraSecrets:
+ airflow-kerberos-keytab: {} # exists from this overlay
+
+Migration guide from the chart
+------------------------------
+
+What the chart currently does
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When ``kerberos.enabled=true`` and ``kerberos.keytabBase64Content`` is
+provided, the chart renders a ``Secret`` carrying the user-supplied
+keytab and a ``ConfigMap`` with the user-supplied ``krb5.conf``. The
+user is expected to bring their own KDC.
+
+What this overlay provides
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+* A working test KDC in the same namespace, so a developer can exercise
+ the chart's kerberos sidecar end-to-end without standing up an
+ external Kerberos service.
+* A bootstrap Job that materialises the keytab Secret automatically,
+ so no base64-encoded blob ends up in ``values.yaml`` or a developer's
+ shell history.
+
+How to switch
+^^^^^^^^^^^^^
+
+1. Install or upgrade the chart with ``kerberos.enabled`` set as you
+ want, but **without** ``kerberos.keytabBase64Content``.
+2. Apply this overlay against the same namespace.
+3. Wait for ``Job/<release>-keytab-bootstrap`` to complete.
+4. Confirm the Secret exists and reference it from the chart's sidecar
+ config as shown above.
+
+Status
+------
+
+This overlay is ``tested``: the ``verify:`` block in ``STATUS.yaml``
+is the smoke-test contract (KDC Deployment Ready, Service exists,
+bootstrap Job Complete, keytab Secret exists), and the
+``test_kerberos.py`` module under
+``chart/tests/overlay_tests/`` adds the
+behavioural assertion: a throwaway client pod ``kinit``\ ing against
+the in-cluster KDC and confirming the principal in ``klist`` output.
+``last-verified`` in ``STATUS.yaml`` records the most recent green
+local run; re-run the smoke test with ``--promote-status`` to refresh
+it whenever you re-verify against your cluster.
+
+To run the smoke test locally:
+
+.. code-block:: bash
+
+ breeze k8s deploy-cluster --rebuild-base-image
+ breeze k8s deploy-airflow
+ breeze k8s smoke-test-overlay kerberos --promote-status
+
+The ``--promote-status`` flag rewrites this overlay's ``STATUS.yaml``
+in place on a green run (chart-version from ``chart/Chart.yaml`` plus
+today's date). Without it the smoke test still runs and verifies, it
+just leaves ``STATUS.yaml`` untouched.
Review Comment:
```suggestion
```
It is general for the overlays, so I think that repeating that information
for every overlay is rather unnecessary.
##########
chart/kustomize-overlays/CONTRIBUTING.rst:
##########
@@ -95,6 +95,92 @@ For an overlay scheduled for removal:
status: deprecated
message: "Replaced by <overlay-name>. Will be removed in chart 3.0.0."
+The optional ``verify:`` block is the smoke-test contract and is also
+**the discovery key for CI**:
+
+.. code-block:: yaml
+
+ verify:
+ timeout_seconds: 300 # optional; default 300, max 3600
+ # `name` is the SUFFIX only - the runner auto-prepends
+ # `<release-name>-` so the same overlay works under any release.
+ # Write `foo`, not `RELEASE-NAME-foo`. The legacy `RELEASE-NAME-foo`
+ # form is still tolerated for older overlays but the short form
+ # is the new convention.
+ resources:
+ - kind: Deployment
+ name: foo # -> matches <release-name>-foo
+ ready: true # waits for rollout to complete
+ - kind: Job
+ name: bootstrap
+ complete: true # waits for condition=complete
+ - kind: Secret
+ name: foo # neither flag = waits for create
+
+How discovery works:
+
+* ``SelectiveChecks.kustomize_overlay_names`` scans
+ ``chart/kustomize-overlays/*/STATUS.yaml`` at CI time and emits the
+ list of overlay directory names whose ``STATUS.yaml`` contains a
+ ``verify:`` block. An overlay **without** a ``verify:`` block is
+ invisible to CI - the smoke-test workflow's matrix never sees it,
+ and the workflow is skipped entirely when the list is empty.
+* The same workflow is gated by
+ ``SelectiveChecks.run_kustomize_overlays_tests``, which only trips
+ on changes under ``chart/kustomize-overlays/`` and the narrow set
+ of files that drive the runner (the prek hook, the breeze command,
+ the workflow file). Unrelated chart edits do not pull in a
+ 30-40 minute kind cluster spin-up.
+
+Practical rule: as soon as an overlay has a ``verify:`` block, CI
+starts running its smoke test on every relevant change. Until then,
+the prek hook's structural check is the only automation that touches
+it.
+
+Where things live (quick reference)
+-----------------------------------
+
+A declarative map of the moving parts in an overlay and its smoke test,
+so authors can answer "where does X go?" without grepping. Everything
+in this table is auto-wired by the framework once it sits in the right
+place - there is no central registry to also update.
+
++--------------------------+-----------------------------------------------------------------+
+| Thing | Where it lives
|
++==========================+=================================================================+
+| Kubernetes resources the | ``chart/kustomize-overlays/<name>/*.yaml``
referenced from |
+| overlay produces | the overlay's ``kustomization.yaml``.
|
++--------------------------+-----------------------------------------------------------------+
+| Container images the | Inline ``image:`` fields on containers /
initContainers / |
Review Comment:
Autodiscovery, pulling, etc., is a really nice thing, but I wonder about the
security context within the Airflow CI. What is the risk of getting access to
the CI-sensitive stuff, like credentials, via the usage of some untrustworthy
image? Should we have some validation/rules for which images can be added and
tested, or will we allow everything from DockerHub?
##########
chart/kustomize-overlays/CONTRIBUTING.rst:
##########
@@ -95,6 +95,92 @@ For an overlay scheduled for removal:
status: deprecated
message: "Replaced by <overlay-name>. Will be removed in chart 3.0.0."
+The optional ``verify:`` block is the smoke-test contract and is also
+**the discovery key for CI**:
+
+.. code-block:: yaml
+
+ verify:
+ timeout_seconds: 300 # optional; default 300, max 3600
+ # `name` is the SUFFIX only - the runner auto-prepends
+ # `<release-name>-` so the same overlay works under any release.
+ # Write `foo`, not `RELEASE-NAME-foo`. The legacy `RELEASE-NAME-foo`
+ # form is still tolerated for older overlays but the short form
+ # is the new convention.
+ resources:
+ - kind: Deployment
+ name: foo # -> matches <release-name>-foo
+ ready: true # waits for rollout to complete
+ - kind: Job
+ name: bootstrap
+ complete: true # waits for condition=complete
+ - kind: Secret
+ name: foo # neither flag = waits for create
+
+How discovery works:
+
+* ``SelectiveChecks.kustomize_overlay_names`` scans
+ ``chart/kustomize-overlays/*/STATUS.yaml`` at CI time and emits the
+ list of overlay directory names whose ``STATUS.yaml`` contains a
+ ``verify:`` block. An overlay **without** a ``verify:`` block is
+ invisible to CI - the smoke-test workflow's matrix never sees it,
+ and the workflow is skipped entirely when the list is empty.
+* The same workflow is gated by
+ ``SelectiveChecks.run_kustomize_overlays_tests``, which only trips
+ on changes under ``chart/kustomize-overlays/`` and the narrow set
+ of files that drive the runner (the prek hook, the breeze command,
+ the workflow file). Unrelated chart edits do not pull in a
+ 30-40 minute kind cluster spin-up.
+
+Practical rule: as soon as an overlay has a ``verify:`` block, CI
+starts running its smoke test on every relevant change. Until then,
+the prek hook's structural check is the only automation that touches
+it.
+
+Where things live (quick reference)
+-----------------------------------
+
+A declarative map of the moving parts in an overlay and its smoke test,
+so authors can answer "where does X go?" without grepping. Everything
+in this table is auto-wired by the framework once it sits in the right
+place - there is no central registry to also update.
+
++--------------------------+-----------------------------------------------------------------+
+| Thing | Where it lives
|
++==========================+=================================================================+
+| Kubernetes resources the | ``chart/kustomize-overlays/<name>/*.yaml``
referenced from |
+| overlay produces | the overlay's ``kustomization.yaml``.
|
++--------------------------+-----------------------------------------------------------------+
+| Container images the | Inline ``image:`` fields on containers /
initContainers / |
+| overlay uses | sidecars in the overlay YAMLs above. **No second
list.** |
+| | ``breeze k8s smoke-test-overlay`` discovers them
by walking |
+| | the rendered manifest and ``docker pull`` +
``kind load``\ s |
+| | each one before applying. Always pair with
|
+| | ``imagePullPolicy: IfNotPresent``.
|
++--------------------------+-----------------------------------------------------------------+
+| Resources to wait for | ``verify:`` block in
|
+| after apply | ``chart/kustomize-overlays/<name>/STATUS.yaml``
(see "STATUS |
+| | file format" above). The smoke-test runner walks
this list. |
++--------------------------+-----------------------------------------------------------------+
+| Behavioural assertions | ``chart/tests/overlay_tests/test_<name>``|
+| beyond "resource exists" | ``.py``. Auto-discovered by the smoke-test runner
if present. |
Review Comment:
```suggestion
| Behavioural assertions | ``chart/tests/overlay_tests/test_<name>.py``|
| beyond "resource exists" | Auto-discovered by the smoke-test runner if
present.|
```
I think that something went wrong here 🤔
##########
chart/kustomize-overlays/CONTRIBUTING.rst:
##########
@@ -95,6 +95,92 @@ For an overlay scheduled for removal:
status: deprecated
message: "Replaced by <overlay-name>. Will be removed in chart 3.0.0."
+The optional ``verify:`` block is the smoke-test contract and is also
+**the discovery key for CI**:
+
+.. code-block:: yaml
+
+ verify:
+ timeout_seconds: 300 # optional; default 300, max 3600
+ # `name` is the SUFFIX only - the runner auto-prepends
+ # `<release-name>-` so the same overlay works under any release.
+ # Write `foo`, not `RELEASE-NAME-foo`. The legacy `RELEASE-NAME-foo`
+ # form is still tolerated for older overlays but the short form
+ # is the new convention.
+ resources:
+ - kind: Deployment
+ name: foo # -> matches <release-name>-foo
+ ready: true # waits for rollout to complete
+ - kind: Job
+ name: bootstrap
+ complete: true # waits for condition=complete
+ - kind: Secret
+ name: foo # neither flag = waits for create
+
+How discovery works:
+
+* ``SelectiveChecks.kustomize_overlay_names`` scans
+ ``chart/kustomize-overlays/*/STATUS.yaml`` at CI time and emits the
+ list of overlay directory names whose ``STATUS.yaml`` contains a
+ ``verify:`` block. An overlay **without** a ``verify:`` block is
+ invisible to CI - the smoke-test workflow's matrix never sees it,
+ and the workflow is skipped entirely when the list is empty.
+* The same workflow is gated by
+ ``SelectiveChecks.run_kustomize_overlays_tests``, which only trips
+ on changes under ``chart/kustomize-overlays/`` and the narrow set
+ of files that drive the runner (the prek hook, the breeze command,
+ the workflow file). Unrelated chart edits do not pull in a
+ 30-40 minute kind cluster spin-up.
Review Comment:
```suggestion
of files that drive the runner (the prek hook, the breeze command,
the workflow file, the chart templates files). Unrelated chart edits do
not pull in a 30-40 minute kind cluster spin-up.
```
##########
chart/tests/overlay_tests/conftest.py:
##########
@@ -0,0 +1,153 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Shared fixtures and helpers for kustomize-overlay smoke tests.
+
+Run from `breeze k8s smoke-test-overlay <name>` which sets the
+``OVERLAY_UNDER_TEST`` / ``OVERLAY_NAMESPACE`` / ``OVERLAY_RELEASE_NAME``
+env vars and ensures kubectl is on PATH against the right kind cluster.
+The helpers below are deliberately small and synchronous so a new
+overlay's behavioural test can be written without reinventing them.
+
+Conventions for new per-overlay tests:
+
+* Inherit from nothing - use the fixtures here and module-level
+ functions. There is no need for ``BaseK8STest`` because overlay tests
+ do not need the airflow REST API.
Review Comment:
```suggestion
do not need the Airflow REST API.
```
##########
chart/kustomize-overlays/CONTRIBUTING.rst:
##########
@@ -123,13 +296,59 @@ Lifecycle steps:
overlay needs invariants beyond that (for example a cross-reference
between resources), they belong in the integration test, not in the prek
hook.
-* A follow-up PR adds a functional integration test for the overlay. Once
- that test passes, ``STATUS`` is flipped to ``tested``.
+* The same PR (or a follow-up) adds a ``verify:`` block to
+ ``STATUS.yaml`` and, optionally, a per-overlay pytest module under
+ ``chart/tests/overlay_tests/``. Once
+ ``breeze k8s smoke-test-overlay <name>`` runs green locally **and** in
+ CI, ``STATUS`` is flipped to ``tested`` with ``chart-version`` and
+ ``last-verified`` filled in.
* An overlay is deprecated by setting ``status: deprecated`` together with a
``message`` field pointing to the replacement.
* Deprecated overlays remain for one chart major version before they are
removed, so users always have a window to migrate.
+Running the smoke test locally
+------------------------------
+
+You can advance an overlay's ``STATUS`` to ``tested`` yourself, in the
+same PR that introduces the overlay, by running the smoke test against
+a local kind cluster. The full sequence mirrors what
+``breeze k8s run-complete-tests`` does in CI:
+
+.. code-block:: bash
+
+ breeze k8s setup-env
+ breeze k8s create-cluster
+ breeze k8s configure-cluster
+ # Build the prod image locally and load it into the kind nodes. The
+ # chart's default image lives on ghcr.io behind CI auth, so without
+ # the build + upload pair, `deploy-airflow` will hang with
+ # ImagePullBackOff (HTTP 403). Drop --rebuild-base-image on
+ # subsequent iterations to speed up rebuilds.
+ breeze k8s build-k8s-image --rebuild-base-image
+ breeze k8s upload-k8s-image
Review Comment:
```suggestion
breeze k8s deploy-cluster --rebuild-base-image
```
I think I saw this 4th time, do you think that link to the one place
wouldn't be enough 🤔?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]