Re: [PR] ci: schedule a weekly Docker image rebuild against the latest release [superset]

via GitHub Tue, 23 Jun 2026 16:54:27 -0700


sadpandajoe commented on code in PR #40426:
URL: https://github.com/apache/superset/pull/40426#discussion_r3463652367



##########
.github/workflows/scheduled-docker-image-refresh.yml:
##########
@@ -0,0 +1,130 @@
+name: Scheduled Docker image refresh
+
+# Re-runs the Docker image build against the latest published release on a
+# weekly cadence. The code being built doesn't change — but the base image
+# layers (python:*-slim-trixie and its OS packages) DO get upstream
+# security patches between Superset releases, and those patches don't
+# reach our published images unless we rebuild.
+#
+# Without this workflow, `apache/superset:<latest>` lags behind upstream
+# Debian/Python base patches by whatever interval falls between Superset
+# releases (typically 3–6 weeks). With it, the lag drops to at most one
+# week regardless of release cadence.
+#
+# This is a security-hygiene cron, not a release. It overwrites the
+# existing tags for the most recent release (e.g. `apache/superset:5.0.0`
+# and `apache/superset:latest`) with bit-for-bit-equivalent contents
+# layered on a refreshed base. Image digests change; everything users
+# actually pin against (image content, code, deps) does not.
+
+on:
+  schedule:
+    # Mondays at 06:00 UTC — gives the weekend for upstream patches to
+    # settle and surfaces failures at the start of the work week so a
+    # human can react.
+    - cron: "0 6 * * 1"
+
+  # Manual trigger so operators can force a refresh on demand (e.g.
+  # immediately after a high-severity base-image CVE drops).
+  workflow_dispatch: {}
+
+permissions:
+  contents: read
+
+# Serialize with itself and with the release publisher (tag-release.yml) —
+# both push to the same Docker Hub tags, so a race could end with stale
+# layers winning. Both workflows must declare this group for the lock to work.
+concurrency:
+  group: docker-publish-latest-release
+  cancel-in-progress: false
+
+jobs:
+  config:
+    runs-on: ubuntu-24.04
+    outputs:
+      has-secrets: ${{ steps.check.outputs.has-secrets }}
+      latest-release: ${{ steps.latest.outputs.tag }}
+    steps:
+      - name: Check for Docker Hub secrets
+        id: check
+        shell: bash
+        run: |
+          if [ -n "${DOCKERHUB_USER}" ]; then
+            echo "has-secrets=1" >> "$GITHUB_OUTPUT"
+          fi
+        env:
+          DOCKERHUB_USER: ${{ (secrets.DOCKERHUB_USER != '' && 
secrets.DOCKERHUB_TOKEN != '') || '' }}
+
+      - name: Look up latest published release
+        id: latest
+        shell: bash
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          REPOSITORY: ${{ github.repository }}
+        run: |
+          # `releases/latest` returns the latest non-prerelease, non-draft
+          # release — which is exactly what `apache/superset:latest`
+          # should reflect.
+          TAG=$(gh api "repos/${REPOSITORY}/releases/latest" --jq .tag_name)
+          if [ -z "$TAG" ] || [ "$TAG" = "null" ]; then
+            echo "::error::Could not determine latest release tag"
+            exit 1
+          fi
+          echo "Latest release: $TAG"
+          echo "tag=$TAG" >> "$GITHUB_OUTPUT"
+
+  docker-rebuild:
+    needs: config
+    if: needs.config.outputs.has-secrets
+    name: docker-rebuild
+    runs-on: ubuntu-24.04
+    strategy:
+      # Mirror the same matrix the release publisher uses so every variant
+      # operators consume from Docker Hub gets the refreshed base.
+      matrix:
+        build_preset: ["dev", "lean", "py310", "websocket", "dockerize", 
"py311", "py312"]
+      fail-fast: false
+    steps:
+      - name: "Checkout release tag: ${{ needs.config.outputs.latest-release 
}}"
+        uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # 
v6.0.3
+        with:
+          ref: ${{ needs.config.outputs.latest-release }}
+          fetch-depth: 0
+          persist-credentials: false
+
+      - name: Setup Docker Environment
+        uses: ./.github/actions/setup-docker
+        with:
+          dockerhub-user: ${{ secrets.DOCKERHUB_USER }}
+          dockerhub-token: ${{ secrets.DOCKERHUB_TOKEN }}
+          install-docker-compose: "false"
+          build: "true"
+
+      - name: Use Node.js 20
+        uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6
+        with:
+          node-version: 20
+
+      - name: Setup supersetbot
+        uses: ./.github/actions/setup-supersetbot/
+
+      - name: Rebuild and push
+        env:
+          DOCKERHUB_USER: ${{ secrets.DOCKERHUB_USER }}
+          DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          BUILD_PRESET: ${{ matrix.build_preset }}
+          LATEST_RELEASE: ${{ needs.config.outputs.latest-release }}
+        run: |
+          # Reuses the same supersetbot invocation as the release
+          # publisher (`tag-release.yml`), so the resulting tags are
+          # identical to what a manual release dispatch would produce —
+          # just with a freshly-pulled base image layer underneath.
+          supersetbot docker \
+            --push \
+            --preset "$BUILD_PRESET" \
+            --context release \
+            --context-ref "$LATEST_RELEASE" \
+            --force-latest \

Review Comment:
   `--force-latest` is unconditional here, so whatever `gh api 
.../releases/latest` returns every Monday gets stamped onto 
`apache/superset:latest`. That endpoint returns the release flagged "latest" in 
the GitHub UI, which is usually what you want, but if a 5.x maintenance patch 
ships after a 6.0 GA and someone leaves it marked as the latest release (an 
easy mis-click during a backport), the next cron run silently rolls `:latest` 
back a major version. `tag-release.yml` sidesteps this by leaving 
`--force-latest` opt-in via `workflow_dispatch`. Could you gate it on the 
scheduled path too, e.g. only pass `--force-latest` when the fetched tag is 
semver `>=` the tag currently behind `:latest`, and log + skip otherwise?



##########
.github/workflows/scheduled-docker-image-refresh.yml:
##########
@@ -0,0 +1,130 @@
+name: Scheduled Docker image refresh
+
+# Re-runs the Docker image build against the latest published release on a
+# weekly cadence. The code being built doesn't change — but the base image
+# layers (python:*-slim-trixie and its OS packages) DO get upstream
+# security patches between Superset releases, and those patches don't
+# reach our published images unless we rebuild.
+#
+# Without this workflow, `apache/superset:<latest>` lags behind upstream
+# Debian/Python base patches by whatever interval falls between Superset
+# releases (typically 3–6 weeks). With it, the lag drops to at most one
+# week regardless of release cadence.
+#
+# This is a security-hygiene cron, not a release. It overwrites the
+# existing tags for the most recent release (e.g. `apache/superset:5.0.0`
+# and `apache/superset:latest`) with bit-for-bit-equivalent contents
+# layered on a refreshed base. Image digests change; everything users
+# actually pin against (image content, code, deps) does not.
+
+on:
+  schedule:
+    # Mondays at 06:00 UTC — gives the weekend for upstream patches to
+    # settle and surfaces failures at the start of the work week so a
+    # human can react.
+    - cron: "0 6 * * 1"
+
+  # Manual trigger so operators can force a refresh on demand (e.g.
+  # immediately after a high-severity base-image CVE drops).
+  workflow_dispatch: {}
+
+permissions:
+  contents: read
+
+# Serialize with itself and with the release publisher (tag-release.yml) —
+# both push to the same Docker Hub tags, so a race could end with stale
+# layers winning. Both workflows must declare this group for the lock to work.
+concurrency:
+  group: docker-publish-latest-release
+  cancel-in-progress: false
+
+jobs:
+  config:
+    runs-on: ubuntu-24.04
+    outputs:
+      has-secrets: ${{ steps.check.outputs.has-secrets }}
+      latest-release: ${{ steps.latest.outputs.tag }}
+    steps:
+      - name: Check for Docker Hub secrets
+        id: check
+        shell: bash
+        run: |
+          if [ -n "${DOCKERHUB_USER}" ]; then
+            echo "has-secrets=1" >> "$GITHUB_OUTPUT"
+          fi
+        env:
+          DOCKERHUB_USER: ${{ (secrets.DOCKERHUB_USER != '' && 
secrets.DOCKERHUB_TOKEN != '') || '' }}
+
+      - name: Look up latest published release
+        id: latest
+        shell: bash
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          REPOSITORY: ${{ github.repository }}
+        run: |
+          # `releases/latest` returns the latest non-prerelease, non-draft
+          # release — which is exactly what `apache/superset:latest`
+          # should reflect.
+          TAG=$(gh api "repos/${REPOSITORY}/releases/latest" --jq .tag_name)
+          if [ -z "$TAG" ] || [ "$TAG" = "null" ]; then
+            echo "::error::Could not determine latest release tag"
+            exit 1
+          fi
+          echo "Latest release: $TAG"
+          echo "tag=$TAG" >> "$GITHUB_OUTPUT"
+
+  docker-rebuild:
+    needs: config
+    if: needs.config.outputs.has-secrets

Review Comment:
   This gate works today because the output is the empty string when secrets 
are absent (falsy), but it's fragile: if anyone later changes the script to 
write `has-secrets=0` on the missing-secret path (the natural defensive edit), 
the gate breaks, since `'0'` is truthy in GHA expressions. Suggest making it 
explicit: `if: needs.config.outputs.has-secrets == '1'`, which also matches the 
pattern already used in `tag-release.yml`.



##########
.github/workflows/scheduled-docker-image-refresh.yml:
##########
@@ -0,0 +1,130 @@
+name: Scheduled Docker image refresh
+
+# Re-runs the Docker image build against the latest published release on a
+# weekly cadence. The code being built doesn't change — but the base image
+# layers (python:*-slim-trixie and its OS packages) DO get upstream
+# security patches between Superset releases, and those patches don't
+# reach our published images unless we rebuild.
+#
+# Without this workflow, `apache/superset:<latest>` lags behind upstream
+# Debian/Python base patches by whatever interval falls between Superset
+# releases (typically 3–6 weeks). With it, the lag drops to at most one
+# week regardless of release cadence.
+#
+# This is a security-hygiene cron, not a release. It overwrites the
+# existing tags for the most recent release (e.g. `apache/superset:5.0.0`
+# and `apache/superset:latest`) with bit-for-bit-equivalent contents
+# layered on a refreshed base. Image digests change; everything users
+# actually pin against (image content, code, deps) does not.
+
+on:
+  schedule:
+    # Mondays at 06:00 UTC — gives the weekend for upstream patches to
+    # settle and surfaces failures at the start of the work week so a
+    # human can react.
+    - cron: "0 6 * * 1"
+
+  # Manual trigger so operators can force a refresh on demand (e.g.
+  # immediately after a high-severity base-image CVE drops).
+  workflow_dispatch: {}
+
+permissions:
+  contents: read
+
+# Serialize with itself and with the release publisher (tag-release.yml) —
+# both push to the same Docker Hub tags, so a race could end with stale
+# layers winning. Both workflows must declare this group for the lock to work.
+concurrency:
+  group: docker-publish-latest-release
+  cancel-in-progress: false
+
+jobs:
+  config:
+    runs-on: ubuntu-24.04
+    outputs:
+      has-secrets: ${{ steps.check.outputs.has-secrets }}
+      latest-release: ${{ steps.latest.outputs.tag }}
+    steps:
+      - name: Check for Docker Hub secrets
+        id: check
+        shell: bash
+        run: |
+          if [ -n "${DOCKERHUB_USER}" ]; then
+            echo "has-secrets=1" >> "$GITHUB_OUTPUT"
+          fi
+        env:
+          DOCKERHUB_USER: ${{ (secrets.DOCKERHUB_USER != '' && 
secrets.DOCKERHUB_TOKEN != '') || '' }}
+
+      - name: Look up latest published release
+        id: latest
+        shell: bash
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          REPOSITORY: ${{ github.repository }}
+        run: |
+          # `releases/latest` returns the latest non-prerelease, non-draft
+          # release — which is exactly what `apache/superset:latest`
+          # should reflect.
+          TAG=$(gh api "repos/${REPOSITORY}/releases/latest" --jq .tag_name)
+          if [ -z "$TAG" ] || [ "$TAG" = "null" ]; then
+            echo "::error::Could not determine latest release tag"
+            exit 1
+          fi
+          echo "Latest release: $TAG"
+          echo "tag=$TAG" >> "$GITHUB_OUTPUT"
+
+  docker-rebuild:
+    needs: config
+    if: needs.config.outputs.has-secrets
+    name: docker-rebuild
+    runs-on: ubuntu-24.04
+    strategy:
+      # Mirror the same matrix the release publisher uses so every variant
+      # operators consume from Docker Hub gets the refreshed base.
+      matrix:
+        build_preset: ["dev", "lean", "py310", "websocket", "dockerize", 
"py311", "py312"]
+      fail-fast: false

Review Comment:
   The whole point of this cron is catching base-image CVEs, so a silent 
failure is the expensive case. Right now the only signal on a failed run is a 
red X in the Actions tab, which nobody is watching on a Monday morning. Worth 
an `if: failure()` step that opens a tracked issue (`gh issue create` with a 
`security`/`bug` label) or pings whatever channel the security folks watch, so 
a missed rebuild doesn't sit unnoticed for weeks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] ci: schedule a weekly Docker image rebuild against the latest release [superset]

Reply via email to