This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git


The following commit(s) were added to refs/heads/main by this push:
     new 7b677c3a feat(ci): convert the lychee link check to a prek hook; 
broaden default sandbox network settings (#499)
7b677c3a is described below

commit 7b677c3a6496ccfc042d5ae06160716af76f0d6c
Author: Jarek Potiuk <[email protected]>
AuthorDate: Thu Jun 11 20:28:22 2026 +0200

    feat(ci): convert the lychee link check to a prek hook; broaden default 
sandbox network settings (#499)
    
    Replaces the standalone `link-check.yml` workflow with a `lychee` prek hook 
so
    the link check runs as part of `prek run --all-files` (locally and in the 
prek
    CI job) and gates merge via the required `prek` status.
    
    - `.pre-commit-config.yaml`: new `lychee` hook, `language: rust` with
      `additional_dependencies: [cli:lychee]` — prek installs lychee itself
      (cargo), so the hook does NOT depend on a locally-installed lychee. Commit
      stage, whole-repo scan, gated on `.md`/`.rst`/`.j2`. Tracks latest lychee.
    - `.lychee.toml`: `include_fragments` boolean -> `"anchor-only"` (lychee 
v0.24+
      enum form; the boolean no longer parses).
    - `.asf.yaml`: drop the now-dead `lychee` required status check (the `prek`
      context covers it). Deletes `link-check.yml`; the daily-cron rot sweep is
      dropped (link rot on untouched files is caught when a PR next edits them).
    - `pre-commit.yml`: cache prek hook envs (avoid recompiling lychee each 
run),
      restore the lychee result cache, pass GITHUB_TOKEN so lychee's github link
      checks are not rate-limited.
    
    To let lychee run in the secure sandbox, broadens the default 
`sandbox.network`
    settings (mirrored into the sandbox-lint baseline, mitigation M.29):
    
    - curated wildcard `allowedDomains` covering the hosts the framework's own
      docs reach (`*.apache.org`, `*.anthropic.com`, `*.claude.com`, 
`*.mitre.org`,
      `*.nist.gov`, `*.github.io`, `astral.sh`, `json.schemastore.org`,
      `lychee.cli.rs`, `sdkman.io`, `gist.github.com`) plus `*.crates.io` for 
cargo.
    - `enableWeakerNetworkIsolation: true` so native-TLS CLI tools (lychee; per 
the
      schema also gh/gcloud/terraform) can verify TLS through the sandbox's
      TLS-terminating proxy. Documented trade-off: reduces security (a potential
      trustd exfil vector); no-op outside the sandbox (e.g. CI). Surfaced with 
that
      caveat in `setup-isolated-setup-update` so adopters opt in consciously.
    
    Docs in `secure-agent-setup.md`; `AGENTS.md` / `CONTRIBUTING.md` updated 
for the
    hook. The lychee hook + the three file-fixers were skipped for THIS local 
commit
    (SKIP=...) because the sandbox blocks writing the protected settings.json 
and
    enableWeakerNetworkIsolation needs a session restart; CI runs the real 
checks.
    
    Generated-by: Claude Code (Opus 4.8 1M context)
---
 .asf.yaml                                   | 16 ++---
 .claude/settings.json                       | 17 ++++-
 .github/workflows/link-check.yml            | 98 -----------------------------
 .github/workflows/pre-commit.yml            | 29 +++++++++
 .lychee.toml                                | 21 ++++---
 .pre-commit-config.yaml                     | 31 +++++++++
 AGENTS.md                                   | 13 ++--
 CONTRIBUTING.md                             | 16 ++---
 docs/setup/secure-agent-setup.md            | 20 +++++-
 skills/setup-isolated-setup-update/SKILL.md | 24 +++++++
 tools/sandbox-lint/expected.json            | 17 ++++-
 11 files changed, 167 insertions(+), 135 deletions(-)

diff --git a/.asf.yaml b/.asf.yaml
index 59e6d587..475bc493 100644
--- a/.asf.yaml
+++ b/.asf.yaml
@@ -154,15 +154,15 @@ github:
         contexts:
           # zizmor — GitHub Actions security lint.
           - "zizmor"
-          # Pre-commit (prek) — static checks across the repo.
+          # Pre-commit (prek) — static checks across the repo,
+          # including the `lychee` link-check hook (link rot, broken
+          # `#anchor` fragments, dead external URLs). lychee was a
+          # standalone `link-check.yml` workflow with its own required
+          # `lychee` status; it is now a prek hook, so the `prek`
+          # context above is what gates link health. (Converting also
+          # dropped the old daily-cron rot sweep — link rot on files
+          # no PR touches is now only caught when a PR next edits them.)
           - "prek"
-          # NOTE: `lychee` (the link checker) is intentionally NOT a
-          # required context. It is being converted from the
-          # standalone `link-check.yml` workflow into a `prek` hook;
-          # de-requiring it here first lets that conversion PR merge
-          # without being blocked on a `lychee` status that will stop
-          # posting once `link-check.yml` is removed. lychee still
-          # runs (and gates via `prek`) once the conversion lands.
           # Per-project pytest matrix from tests.yml. Required via
           # the single `tests-ok` umbrella job rather than the
           # individual `pytest (<project>)` matrix entries — branch
diff --git a/.claude/settings.json b/.claude/settings.json
index a4793dea..63fa217f 100644
--- a/.claude/settings.json
+++ b/.claude/settings.json
@@ -42,8 +42,21 @@
         "www.cve.org",
         "cveawg.mitre.org",
         "oauth2.googleapis.com",
-        "gmail.googleapis.com"
-      ]
+        "gmail.googleapis.com",
+        "*.crates.io",
+        "*.apache.org",
+        "*.anthropic.com",
+        "*.claude.com",
+        "*.mitre.org",
+        "*.nist.gov",
+        "*.github.io",
+        "gist.github.com",
+        "astral.sh",
+        "json.schemastore.org",
+        "lychee.cli.rs",
+        "sdkman.io"
+      ],
+      "enableWeakerNetworkIsolation": true
     }
   },
   "permissions": {
diff --git a/.github/workflows/link-check.yml b/.github/workflows/link-check.yml
deleted file mode 100644
index 64cbb9c8..00000000
--- a/.github/workflows/link-check.yml
+++ /dev/null
@@ -1,98 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-#
----
-name: link-check
-
-# Lychee is a hard gate: every broken internal link or unreachable
-# external URL fails this workflow, and `lychee` is one of the
-# required status checks in `.asf.yaml`. Promoted from
-# informational mode in PR #47 after the pre-existing baseline
-# was driven to zero. Anyone adding a broken link to a PR fails
-# this check and cannot merge until the link is fixed (or
-# excluded via `.lychee.toml` if the target is genuinely
-# placeholder-style — but the bar for adding excludes is high).
-
-on:  # yamllint disable-line rule:truthy
-  pull_request:
-  push:
-    branches: [main]
-  schedule:
-    # Daily run catches link rot in external URLs even when no PR
-    # touches them.
-    - cron: "0 8 * * *"
-
-permissions: {}
-
-jobs:
-  lychee:
-    runs-on: ubuntu-latest
-    permissions:
-      contents: read
-      issues: write
-    steps:
-      - uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10  # 
v6.0.3
-        with:
-          persist-credentials: false
-
-      # Restore the lychee result cache so external URL checks reuse
-      # results across runs (config sets `max_cache_age = "7d"`).
-      - name: Restore lychee cache
-        uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae  # v5.0.5
-        with:
-          path: .lycheecache
-          key: cache-lychee-${{ github.sha }}
-          restore-keys: cache-lychee-
-
-      - name: Run lychee
-        id: lychee
-        # Pinned SHA must come from the ASF infrastructure-actions
-        # allowlist 
(https://github.com/apache/infrastructure-actions/blob/main/approved_patterns.yml),
-        # which the `asf-allowlist-check` workflow enforces on every
-        # PR. The previous v2.6.1 pin was not on the allowlist; v2.8.0
-        # (2026-02-17) is. When bumping, pick the next allowlisted SHA
-        # — do not pick the latest upstream release blindly.
-        #
-        # `lycheeVersion` is pinned to v0.23.0 — the newest binary
-        # whose release archive lays the `lychee` executable at the
-        # top level. From the v0.24 line onward the archive wraps
-        # the binary in a `lychee-<arch>-<os>/` directory, which
-        # the v2.8.0 action's install step does not handle (it
-        # `install`s the literal `lychee` path and exits with
-        # `cannot stat`). Until a newer lychee-action SHA is added
-        # to the ASF infrastructure-actions allowlist, the binary
-        # has to stay on v0.23.x. `.lychee.toml` matches: the
-        # boolean form of `include_fragments` is what v0.23.x
-        # expects (the enum string `"anchor-only"` is a v0.24+
-        # form).
-        uses: 
lycheeverse/lychee-action@8646ba30535128ac92d33dfc9133794bfdd9b411  # v2.8.0
-        with:
-          args: --config .lychee.toml --no-progress .
-          fail: true
-          lycheeVersion: v0.23.0
-          token: ${{ secrets.GITHUB_TOKEN }}
-        continue-on-error: false
-
-      - name: Summarise
-        if: always()
-        run: |
-          if [ -f lychee/out.md ]; then
-            echo "## Lychee link-check report" >> "$GITHUB_STEP_SUMMARY"
-            cat lychee/out.md >> "$GITHUB_STEP_SUMMARY"
-          else
-            echo "Lychee did not produce a report file." >> 
"$GITHUB_STEP_SUMMARY"
-          fi
diff --git a/.github/workflows/pre-commit.yml b/.github/workflows/pre-commit.yml
index 62479e29..a07788aa 100644
--- a/.github/workflows/pre-commit.yml
+++ b/.github/workflows/pre-commit.yml
@@ -61,10 +61,39 @@ jobs:
         # `package = false` so member packages and their
         # `[project.scripts]` entry points would be skipped.
         run: uv sync --all-packages --group dev
+      # Cache prek's hook environments. The `lychee` hook is
+      # `language: rust` with `additional_dependencies: [cli:lychee:…]`,
+      # so prek `cargo install`s lychee into a hook env under
+      # `~/.cache/prek` — a multi-minute compile. Caching that dir
+      # reuses the built binary (and every other hook env) across runs;
+      # keyed on the prek config so a hook/version change busts it.
+      - name: Cache prek hook environments
+        uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae  # v5.0.5
+        with:
+          path: ~/.cache/prek
+          key: prek-${{ runner.os }}-${{ hashFiles('.pre-commit-config.yaml') 
}}
+          restore-keys: |
+            prek-${{ runner.os }}-
+      # Restore the lychee result cache so external-URL checks reuse
+      # results across runs (`.lychee.toml` sets `max_cache_age = "7d"`).
+      - name: Cache lychee link-check results
+        uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae  # v5.0.5
+        with:
+          path: .lycheecache
+          key: cache-lychee-${{ github.sha }}
+          restore-keys: |
+            cache-lychee-
       # Install prek via uv (rather than via the `j178/prek-action`
       # action) so the `[tool.uv] exclude-newer` cooldown in the
       # root `pyproject.toml` applies to the prek install as well.
       - name: Install prek
         run: uv tool install prek
       - name: Run prek
+        # GITHUB_TOKEN lets the `lychee` hook authenticate its
+        # github.com link checks — unauthenticated requests get
+        # rate-limited (429) once a run checks more than a handful of
+        # GitHub URLs. lychee reads GITHUB_TOKEN automatically; the
+        # job's `contents: read` scope is sufficient for link checking.
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
         run: prek run --show-diff-on-failure --color=always --all-files
diff --git a/.lychee.toml b/.lychee.toml
index 055074d9..98985137 100644
--- a/.lychee.toml
+++ b/.lychee.toml
@@ -5,18 +5,21 @@
 #   * cross-file fragments     — `[text](other.md#anchor)`
 #   * external URLs            — HTTP 2xx
 #
-# Run locally:
-#   lychee --config .lychee.toml .
+# Run via prek (locally and in CI) as the `lychee` hook in
+# `.pre-commit-config.yaml` — prek installs lychee itself, so no local
+# lychee install is needed:
+#   prek run lychee --all-files
 #
-# Run in CI: see `.github/workflows/doc-validation.yml`.
+# (Or directly, if you have lychee >= 0.24 installed:
+#   lychee --config .lychee.toml .)
 
 # Check anchor fragments, not just file paths — `#section` checks
-# the GitHub-style slug exists in the linked file. Boolean form is
-# the v0.23.x schema (and the version `link-check.yml` pins; see
-# the comment there for why we cannot move to v0.24's enum-string
-# form yet). When CI moves to lychee v0.24+, change this to
-# `include_fragments = "anchor-only"`.
-include_fragments = true
+# the GitHub-style slug exists in the linked file. The enum-string
+# form (`"anchor-only"`) is the lychee v0.24+ schema; the link check
+# now runs as the `lychee` prek hook (see `.pre-commit-config.yaml`)
+# against a directly-installed lychee >= 0.24, not the old pinned
+# `lychee-action`. The v0.23.x boolean form (`true`) no longer parses.
+include_fragments = "anchor-only"
 
 # Concurrency cap — kept moderate to avoid being rate-limited by GitHub.
 max_concurrency = 14
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 3c7d54fa..0e29639d 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -103,6 +103,37 @@ repos:
       - id: typos
         name: typos
         args: [--force-exclude]
+  # lychee — link checker (was the standalone `link-check.yml`
+  # workflow; converted to this prek hook). Validates cross-file links,
+  # `#anchor` fragments, and external URLs across markdown / rst /
+  # `.md.j2`. Config in `.lychee.toml`.
+  #
+  # `language: rust` with `additional_dependencies: [cli:lychee]`
+  # means prek installs lychee itself (cargo install from crates.io)
+  # into an isolated, cached hook env — the hook does NOT depend on a
+  # locally-installed lychee, and the prek CI workflow needs no extra
+  # install step. Unpinned, so it tracks the **latest** lychee; the
+  # exact version is resolved when the hook env is first built and then
+  # held stable by the prek env cache until the cache busts (keyed on
+  # this file). lychee must be >= 0.24 for `.lychee.toml`'s
+  # `include_fragments = "anchor-only"` enum-string schema.
+  #
+  # `pass_filenames: false` + the trailing `.` arg → lychee scans the
+  # whole repo (so a renamed link target is caught no matter which
+  # file references it); `files:` only gates *whether* the whole-repo
+  # scan fires, i.e. it runs when any doc file changes and always on
+  # `prek run --all-files` (CI). External-URL results are cached for
+  # 7 days (`.lycheecache`, gitignored).
+  - repo: local
+    hooks:
+      - id: lychee
+        name: lychee (link check)
+        language: rust
+        entry: lychee
+        args: ["--config", ".lychee.toml", "--no-progress", "."]
+        additional_dependencies: ["cli:lychee"]
+        files: \.(md|rst|j2)$
+        pass_filenames: false
   # Local placeholder linter — catches hardcoded references like
   # `apache/airflow` or `Apache Airflow` that should be the
   # placeholder tokens `<upstream>` / `<PROJECT>` per
diff --git a/AGENTS.md b/AGENTS.md
index 0945a280..419d44f8 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -898,14 +898,15 @@ model responds.
 
 - Re-read the diff and check that every change is intentional.
 - Check that any renamed headings have matching TOC updates.
-- **Run lychee against every changed `.md` / `.rst` / `.md.j2` file.**
-  CI runs the same check on every PR and a single broken link blocks
-  the merge; catching it locally avoids a round-trip. The canonical
-  recipe — same as
-  [`.github/workflows/link-check.yml`](.github/workflows/link-check.yml)
-  invokes:
+- **Run the lychee link check.** It runs as the `lychee` hook in
+  `prek run --all-files` (the `pre-commit.yml` CI workflow) and gates
+  merge via the required `prek` status; a single broken link, dead
+  `#anchor`, or unreachable URL fails it. Catch it locally first — the
+  hook is `language: rust`, so prek installs lychee for you:
 
   ```bash
+  prek run lychee --all-files
+  # or, if you have lychee >= 0.24 installed directly:
   lychee --config .lychee.toml .
   ```
 
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index e5a931fb..a222170c 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -804,13 +804,15 @@ Separate GitHub workflows:
 - **`pre-commit.yml`** — runs `prek run --all-files` in CI.
 - **`zizmor.yml`** — lints GitHub Actions workflows for known-bad
   patterns; runs on every PR.
-- **`link-check.yml`** — runs [lychee](https://lychee.cli.rs/) on
-  every PR and daily on a schedule. **Hard gate** (`fail: true`,
-  `continue-on-error: false`); a single broken internal link or
-  unreachable external URL fails the workflow and blocks merge.
-  Run lychee locally before pushing (see *Before submitting* in
-  [`AGENTS.md`](AGENTS.md#before-submitting)) — the local invocation
-  catches the same errors and avoids a CI round-trip.
+The link check ([lychee](https://lychee.cli.rs/)) is **not** a
+separate workflow — it runs as the `lychee` hook inside
+`prek run --all-files` (the `pre-commit.yml` workflow above), and so
+is part of the required `prek` status check. It is a **hard gate**: a
+single broken internal link, dead `#anchor` fragment, or unreachable
+external URL fails `prek` and blocks merge. The hook is
+`language: rust`, so prek installs lychee itself — `prek run lychee`
+locally (no separate lychee install needed) catches the same errors
+before you push.
 
 To run a single Python package's tests directly:
 
diff --git a/docs/setup/secure-agent-setup.md b/docs/setup/secure-agent-setup.md
index 22ff2e2e..07037183 100644
--- a/docs/setup/secure-agent-setup.md
+++ b/docs/setup/secure-agent-setup.md
@@ -380,9 +380,23 @@ below, annotated.
         "objects.githubusercontent.com", "codeload.github.com", 
"uploads.github.com",
         "pypi.org", "files.pythonhosted.org",
         "lists.apache.org", "dist.apache.org", "downloads.apache.org", 
"archive.apache.org",
-        "cveprocess.apache.org", "cve.org", "www.cve.org",
-        "oauth2.googleapis.com", "gmail.googleapis.com"
-      ]
+        "cveprocess.apache.org", "cve.org", "www.cve.org", "cveawg.mitre.org",
+        "oauth2.googleapis.com", "gmail.googleapis.com",
+        // Added with the `lychee` link-check prek hook: the hosts the
+        // framework's own docs link to (so lychee passes in-sandbox)
+        // plus `*.crates.io` (so the rust hook can `cargo install` lychee).
+        "*.crates.io", "*.apache.org", "*.anthropic.com", "*.claude.com",
+        "*.mitre.org", "*.nist.gov", "*.github.io", "gist.github.com",
+        "astral.sh", "json.schemastore.org", "lychee.cli.rs", "sdkman.io"
+      ],
+      // Lets native-TLS CLI tools (lychee — and, per the schema, gh /
+      // gcloud / terraform) verify TLS through the sandbox's
+      // TLS-terminating proxy; without it lychee fails every external
+      // link with `failed to verify TLS certificate`. Documented
+      // trade-off: "reduces security — opens a potential
+      // data-exfiltration vector through the trustd service." No-op
+      // outside the sandbox (e.g. CI). macOS-only.
+      "enableWeakerNetworkIsolation": true
     }
   },
   "permissions": {
diff --git a/skills/setup-isolated-setup-update/SKILL.md 
b/skills/setup-isolated-setup-update/SKILL.md
index a478cd84..086889a9 100644
--- a/skills/setup-isolated-setup-update/SKILL.md
+++ b/skills/setup-isolated-setup-update/SKILL.md
@@ -175,6 +175,30 @@ Walk each:
    `hooks.PreToolUse` entry** (matcher `Bash`) if the user wired
    the secure setup before the guard shipped. Report new entries
    the user does not have; do not auto-merge.
+
+   Two network-layer defaults landed with the `lychee` link-check
+   prek hook — surface both if the user's settings predate them
+   (both `sandbox.network.*`):
+
+   - **Broadened `allowedDomains`.** The dogfooded default now
+     allows the curated set the framework's own docs and dev tools
+     reach — `*.crates.io` (so the rust `lychee` hook can
+     `cargo install` lychee), `*.apache.org`, `*.anthropic.com`,
+     `*.claude.com`, `*.mitre.org`, `*.nist.gov`, `*.github.io`,
+     `gist.github.com`, `astral.sh`, `json.schemastore.org`,
+     `lychee.cli.rs`, `sdkman.io`. Without these, lychee fails the
+     PR-blocking `prek` check locally on first run.
+   - **`enableWeakerNetworkIsolation: true`.** Required for
+     native-TLS CLI tools (lychee, and the same mechanism the
+     schema notes for `gh` / `gcloud` / `terraform`) to verify TLS
+     through the sandbox's TLS-terminating proxy — without it lychee
+     fails every external link with `failed to verify TLS
+     certificate`. **Surface the documented trade-off when
+     reporting it**: the schema warns it "reduces security — opens a
+     potential data-exfiltration vector through the trustd service,"
+     so the user decides whether to enable it (the default ships it
+     on because the link check needs it). It is a no-op outside the
+     sandbox, e.g. in CI.
 5. **comdev MCP checkouts (`ponymail`, `apache-projects`).** These
    ASF MCP servers are installed from a local `apache/comdev`
    checkout and are **tracked at `main`, not pinned** — unlike the
diff --git a/tools/sandbox-lint/expected.json b/tools/sandbox-lint/expected.json
index a4793dea..63fa217f 100644
--- a/tools/sandbox-lint/expected.json
+++ b/tools/sandbox-lint/expected.json
@@ -42,8 +42,21 @@
         "www.cve.org",
         "cveawg.mitre.org",
         "oauth2.googleapis.com",
-        "gmail.googleapis.com"
-      ]
+        "gmail.googleapis.com",
+        "*.crates.io",
+        "*.apache.org",
+        "*.anthropic.com",
+        "*.claude.com",
+        "*.mitre.org",
+        "*.nist.gov",
+        "*.github.io",
+        "gist.github.com",
+        "astral.sh",
+        "json.schemastore.org",
+        "lychee.cli.rs",
+        "sdkman.io"
+      ],
+      "enableWeakerNetworkIsolation": true
     }
   },
   "permissions": {

Reply via email to