This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git
The following commit(s) were added to refs/heads/main by this push:
new 7b677c3a feat(ci): convert the lychee link check to a prek hook;
broaden default sandbox network settings (#499)
7b677c3a is described below
commit 7b677c3a6496ccfc042d5ae06160716af76f0d6c
Author: Jarek Potiuk <[email protected]>
AuthorDate: Thu Jun 11 20:28:22 2026 +0200
feat(ci): convert the lychee link check to a prek hook; broaden default
sandbox network settings (#499)
Replaces the standalone `link-check.yml` workflow with a `lychee` prek hook
so
the link check runs as part of `prek run --all-files` (locally and in the
prek
CI job) and gates merge via the required `prek` status.
- `.pre-commit-config.yaml`: new `lychee` hook, `language: rust` with
`additional_dependencies: [cli:lychee]` — prek installs lychee itself
(cargo), so the hook does NOT depend on a locally-installed lychee. Commit
stage, whole-repo scan, gated on `.md`/`.rst`/`.j2`. Tracks latest lychee.
- `.lychee.toml`: `include_fragments` boolean -> `"anchor-only"` (lychee
v0.24+
enum form; the boolean no longer parses).
- `.asf.yaml`: drop the now-dead `lychee` required status check (the `prek`
context covers it). Deletes `link-check.yml`; the daily-cron rot sweep is
dropped (link rot on untouched files is caught when a PR next edits them).
- `pre-commit.yml`: cache prek hook envs (avoid recompiling lychee each
run),
restore the lychee result cache, pass GITHUB_TOKEN so lychee's github link
checks are not rate-limited.
To let lychee run in the secure sandbox, broadens the default
`sandbox.network`
settings (mirrored into the sandbox-lint baseline, mitigation M.29):
- curated wildcard `allowedDomains` covering the hosts the framework's own
docs reach (`*.apache.org`, `*.anthropic.com`, `*.claude.com`,
`*.mitre.org`,
`*.nist.gov`, `*.github.io`, `astral.sh`, `json.schemastore.org`,
`lychee.cli.rs`, `sdkman.io`, `gist.github.com`) plus `*.crates.io` for
cargo.
- `enableWeakerNetworkIsolation: true` so native-TLS CLI tools (lychee; per
the
schema also gh/gcloud/terraform) can verify TLS through the sandbox's
TLS-terminating proxy. Documented trade-off: reduces security (a potential
trustd exfil vector); no-op outside the sandbox (e.g. CI). Surfaced with
that
caveat in `setup-isolated-setup-update` so adopters opt in consciously.
Docs in `secure-agent-setup.md`; `AGENTS.md` / `CONTRIBUTING.md` updated
for the
hook. The lychee hook + the three file-fixers were skipped for THIS local
commit
(SKIP=...) because the sandbox blocks writing the protected settings.json
and
enableWeakerNetworkIsolation needs a session restart; CI runs the real
checks.
Generated-by: Claude Code (Opus 4.8 1M context)
---
.asf.yaml | 16 ++---
.claude/settings.json | 17 ++++-
.github/workflows/link-check.yml | 98 -----------------------------
.github/workflows/pre-commit.yml | 29 +++++++++
.lychee.toml | 21 ++++---
.pre-commit-config.yaml | 31 +++++++++
AGENTS.md | 13 ++--
CONTRIBUTING.md | 16 ++---
docs/setup/secure-agent-setup.md | 20 +++++-
skills/setup-isolated-setup-update/SKILL.md | 24 +++++++
tools/sandbox-lint/expected.json | 17 ++++-
11 files changed, 167 insertions(+), 135 deletions(-)
diff --git a/.asf.yaml b/.asf.yaml
index 59e6d587..475bc493 100644
--- a/.asf.yaml
+++ b/.asf.yaml
@@ -154,15 +154,15 @@ github:
contexts:
# zizmor — GitHub Actions security lint.
- "zizmor"
- # Pre-commit (prek) — static checks across the repo.
+ # Pre-commit (prek) — static checks across the repo,
+ # including the `lychee` link-check hook (link rot, broken
+ # `#anchor` fragments, dead external URLs). lychee was a
+ # standalone `link-check.yml` workflow with its own required
+ # `lychee` status; it is now a prek hook, so the `prek`
+ # context above is what gates link health. (Converting also
+ # dropped the old daily-cron rot sweep — link rot on files
+ # no PR touches is now only caught when a PR next edits them.)
- "prek"
- # NOTE: `lychee` (the link checker) is intentionally NOT a
- # required context. It is being converted from the
- # standalone `link-check.yml` workflow into a `prek` hook;
- # de-requiring it here first lets that conversion PR merge
- # without being blocked on a `lychee` status that will stop
- # posting once `link-check.yml` is removed. lychee still
- # runs (and gates via `prek`) once the conversion lands.
# Per-project pytest matrix from tests.yml. Required via
# the single `tests-ok` umbrella job rather than the
# individual `pytest (<project>)` matrix entries — branch
diff --git a/.claude/settings.json b/.claude/settings.json
index a4793dea..63fa217f 100644
--- a/.claude/settings.json
+++ b/.claude/settings.json
@@ -42,8 +42,21 @@
"www.cve.org",
"cveawg.mitre.org",
"oauth2.googleapis.com",
- "gmail.googleapis.com"
- ]
+ "gmail.googleapis.com",
+ "*.crates.io",
+ "*.apache.org",
+ "*.anthropic.com",
+ "*.claude.com",
+ "*.mitre.org",
+ "*.nist.gov",
+ "*.github.io",
+ "gist.github.com",
+ "astral.sh",
+ "json.schemastore.org",
+ "lychee.cli.rs",
+ "sdkman.io"
+ ],
+ "enableWeakerNetworkIsolation": true
}
},
"permissions": {
diff --git a/.github/workflows/link-check.yml b/.github/workflows/link-check.yml
deleted file mode 100644
index 64cbb9c8..00000000
--- a/.github/workflows/link-check.yml
+++ /dev/null
@@ -1,98 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-#
----
-name: link-check
-
-# Lychee is a hard gate: every broken internal link or unreachable
-# external URL fails this workflow, and `lychee` is one of the
-# required status checks in `.asf.yaml`. Promoted from
-# informational mode in PR #47 after the pre-existing baseline
-# was driven to zero. Anyone adding a broken link to a PR fails
-# this check and cannot merge until the link is fixed (or
-# excluded via `.lychee.toml` if the target is genuinely
-# placeholder-style — but the bar for adding excludes is high).
-
-on: # yamllint disable-line rule:truthy
- pull_request:
- push:
- branches: [main]
- schedule:
- # Daily run catches link rot in external URLs even when no PR
- # touches them.
- - cron: "0 8 * * *"
-
-permissions: {}
-
-jobs:
- lychee:
- runs-on: ubuntu-latest
- permissions:
- contents: read
- issues: write
- steps:
- - uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 #
v6.0.3
- with:
- persist-credentials: false
-
- # Restore the lychee result cache so external URL checks reuse
- # results across runs (config sets `max_cache_age = "7d"`).
- - name: Restore lychee cache
- uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
- with:
- path: .lycheecache
- key: cache-lychee-${{ github.sha }}
- restore-keys: cache-lychee-
-
- - name: Run lychee
- id: lychee
- # Pinned SHA must come from the ASF infrastructure-actions
- # allowlist
(https://github.com/apache/infrastructure-actions/blob/main/approved_patterns.yml),
- # which the `asf-allowlist-check` workflow enforces on every
- # PR. The previous v2.6.1 pin was not on the allowlist; v2.8.0
- # (2026-02-17) is. When bumping, pick the next allowlisted SHA
- # — do not pick the latest upstream release blindly.
- #
- # `lycheeVersion` is pinned to v0.23.0 — the newest binary
- # whose release archive lays the `lychee` executable at the
- # top level. From the v0.24 line onward the archive wraps
- # the binary in a `lychee-<arch>-<os>/` directory, which
- # the v2.8.0 action's install step does not handle (it
- # `install`s the literal `lychee` path and exits with
- # `cannot stat`). Until a newer lychee-action SHA is added
- # to the ASF infrastructure-actions allowlist, the binary
- # has to stay on v0.23.x. `.lychee.toml` matches: the
- # boolean form of `include_fragments` is what v0.23.x
- # expects (the enum string `"anchor-only"` is a v0.24+
- # form).
- uses:
lycheeverse/lychee-action@8646ba30535128ac92d33dfc9133794bfdd9b411 # v2.8.0
- with:
- args: --config .lychee.toml --no-progress .
- fail: true
- lycheeVersion: v0.23.0
- token: ${{ secrets.GITHUB_TOKEN }}
- continue-on-error: false
-
- - name: Summarise
- if: always()
- run: |
- if [ -f lychee/out.md ]; then
- echo "## Lychee link-check report" >> "$GITHUB_STEP_SUMMARY"
- cat lychee/out.md >> "$GITHUB_STEP_SUMMARY"
- else
- echo "Lychee did not produce a report file." >>
"$GITHUB_STEP_SUMMARY"
- fi
diff --git a/.github/workflows/pre-commit.yml b/.github/workflows/pre-commit.yml
index 62479e29..a07788aa 100644
--- a/.github/workflows/pre-commit.yml
+++ b/.github/workflows/pre-commit.yml
@@ -61,10 +61,39 @@ jobs:
# `package = false` so member packages and their
# `[project.scripts]` entry points would be skipped.
run: uv sync --all-packages --group dev
+ # Cache prek's hook environments. The `lychee` hook is
+ # `language: rust` with `additional_dependencies: [cli:lychee:…]`,
+ # so prek `cargo install`s lychee into a hook env under
+ # `~/.cache/prek` — a multi-minute compile. Caching that dir
+ # reuses the built binary (and every other hook env) across runs;
+ # keyed on the prek config so a hook/version change busts it.
+ - name: Cache prek hook environments
+ uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+ with:
+ path: ~/.cache/prek
+ key: prek-${{ runner.os }}-${{ hashFiles('.pre-commit-config.yaml')
}}
+ restore-keys: |
+ prek-${{ runner.os }}-
+ # Restore the lychee result cache so external-URL checks reuse
+ # results across runs (`.lychee.toml` sets `max_cache_age = "7d"`).
+ - name: Cache lychee link-check results
+ uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+ with:
+ path: .lycheecache
+ key: cache-lychee-${{ github.sha }}
+ restore-keys: |
+ cache-lychee-
# Install prek via uv (rather than via the `j178/prek-action`
# action) so the `[tool.uv] exclude-newer` cooldown in the
# root `pyproject.toml` applies to the prek install as well.
- name: Install prek
run: uv tool install prek
- name: Run prek
+ # GITHUB_TOKEN lets the `lychee` hook authenticate its
+ # github.com link checks — unauthenticated requests get
+ # rate-limited (429) once a run checks more than a handful of
+ # GitHub URLs. lychee reads GITHUB_TOKEN automatically; the
+ # job's `contents: read` scope is sufficient for link checking.
+ env:
+ GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: prek run --show-diff-on-failure --color=always --all-files
diff --git a/.lychee.toml b/.lychee.toml
index 055074d9..98985137 100644
--- a/.lychee.toml
+++ b/.lychee.toml
@@ -5,18 +5,21 @@
# * cross-file fragments — `[text](other.md#anchor)`
# * external URLs — HTTP 2xx
#
-# Run locally:
-# lychee --config .lychee.toml .
+# Run via prek (locally and in CI) as the `lychee` hook in
+# `.pre-commit-config.yaml` — prek installs lychee itself, so no local
+# lychee install is needed:
+# prek run lychee --all-files
#
-# Run in CI: see `.github/workflows/doc-validation.yml`.
+# (Or directly, if you have lychee >= 0.24 installed:
+# lychee --config .lychee.toml .)
# Check anchor fragments, not just file paths — `#section` checks
-# the GitHub-style slug exists in the linked file. Boolean form is
-# the v0.23.x schema (and the version `link-check.yml` pins; see
-# the comment there for why we cannot move to v0.24's enum-string
-# form yet). When CI moves to lychee v0.24+, change this to
-# `include_fragments = "anchor-only"`.
-include_fragments = true
+# the GitHub-style slug exists in the linked file. The enum-string
+# form (`"anchor-only"`) is the lychee v0.24+ schema; the link check
+# now runs as the `lychee` prek hook (see `.pre-commit-config.yaml`)
+# against a directly-installed lychee >= 0.24, not the old pinned
+# `lychee-action`. The v0.23.x boolean form (`true`) no longer parses.
+include_fragments = "anchor-only"
# Concurrency cap — kept moderate to avoid being rate-limited by GitHub.
max_concurrency = 14
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 3c7d54fa..0e29639d 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -103,6 +103,37 @@ repos:
- id: typos
name: typos
args: [--force-exclude]
+ # lychee — link checker (was the standalone `link-check.yml`
+ # workflow; converted to this prek hook). Validates cross-file links,
+ # `#anchor` fragments, and external URLs across markdown / rst /
+ # `.md.j2`. Config in `.lychee.toml`.
+ #
+ # `language: rust` with `additional_dependencies: [cli:lychee]`
+ # means prek installs lychee itself (cargo install from crates.io)
+ # into an isolated, cached hook env — the hook does NOT depend on a
+ # locally-installed lychee, and the prek CI workflow needs no extra
+ # install step. Unpinned, so it tracks the **latest** lychee; the
+ # exact version is resolved when the hook env is first built and then
+ # held stable by the prek env cache until the cache busts (keyed on
+ # this file). lychee must be >= 0.24 for `.lychee.toml`'s
+ # `include_fragments = "anchor-only"` enum-string schema.
+ #
+ # `pass_filenames: false` + the trailing `.` arg → lychee scans the
+ # whole repo (so a renamed link target is caught no matter which
+ # file references it); `files:` only gates *whether* the whole-repo
+ # scan fires, i.e. it runs when any doc file changes and always on
+ # `prek run --all-files` (CI). External-URL results are cached for
+ # 7 days (`.lycheecache`, gitignored).
+ - repo: local
+ hooks:
+ - id: lychee
+ name: lychee (link check)
+ language: rust
+ entry: lychee
+ args: ["--config", ".lychee.toml", "--no-progress", "."]
+ additional_dependencies: ["cli:lychee"]
+ files: \.(md|rst|j2)$
+ pass_filenames: false
# Local placeholder linter — catches hardcoded references like
# `apache/airflow` or `Apache Airflow` that should be the
# placeholder tokens `<upstream>` / `<PROJECT>` per
diff --git a/AGENTS.md b/AGENTS.md
index 0945a280..419d44f8 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -898,14 +898,15 @@ model responds.
- Re-read the diff and check that every change is intentional.
- Check that any renamed headings have matching TOC updates.
-- **Run lychee against every changed `.md` / `.rst` / `.md.j2` file.**
- CI runs the same check on every PR and a single broken link blocks
- the merge; catching it locally avoids a round-trip. The canonical
- recipe — same as
- [`.github/workflows/link-check.yml`](.github/workflows/link-check.yml)
- invokes:
+- **Run the lychee link check.** It runs as the `lychee` hook in
+ `prek run --all-files` (the `pre-commit.yml` CI workflow) and gates
+ merge via the required `prek` status; a single broken link, dead
+ `#anchor`, or unreachable URL fails it. Catch it locally first — the
+ hook is `language: rust`, so prek installs lychee for you:
```bash
+ prek run lychee --all-files
+ # or, if you have lychee >= 0.24 installed directly:
lychee --config .lychee.toml .
```
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index e5a931fb..a222170c 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -804,13 +804,15 @@ Separate GitHub workflows:
- **`pre-commit.yml`** — runs `prek run --all-files` in CI.
- **`zizmor.yml`** — lints GitHub Actions workflows for known-bad
patterns; runs on every PR.
-- **`link-check.yml`** — runs [lychee](https://lychee.cli.rs/) on
- every PR and daily on a schedule. **Hard gate** (`fail: true`,
- `continue-on-error: false`); a single broken internal link or
- unreachable external URL fails the workflow and blocks merge.
- Run lychee locally before pushing (see *Before submitting* in
- [`AGENTS.md`](AGENTS.md#before-submitting)) — the local invocation
- catches the same errors and avoids a CI round-trip.
+The link check ([lychee](https://lychee.cli.rs/)) is **not** a
+separate workflow — it runs as the `lychee` hook inside
+`prek run --all-files` (the `pre-commit.yml` workflow above), and so
+is part of the required `prek` status check. It is a **hard gate**: a
+single broken internal link, dead `#anchor` fragment, or unreachable
+external URL fails `prek` and blocks merge. The hook is
+`language: rust`, so prek installs lychee itself — `prek run lychee`
+locally (no separate lychee install needed) catches the same errors
+before you push.
To run a single Python package's tests directly:
diff --git a/docs/setup/secure-agent-setup.md b/docs/setup/secure-agent-setup.md
index 22ff2e2e..07037183 100644
--- a/docs/setup/secure-agent-setup.md
+++ b/docs/setup/secure-agent-setup.md
@@ -380,9 +380,23 @@ below, annotated.
"objects.githubusercontent.com", "codeload.github.com",
"uploads.github.com",
"pypi.org", "files.pythonhosted.org",
"lists.apache.org", "dist.apache.org", "downloads.apache.org",
"archive.apache.org",
- "cveprocess.apache.org", "cve.org", "www.cve.org",
- "oauth2.googleapis.com", "gmail.googleapis.com"
- ]
+ "cveprocess.apache.org", "cve.org", "www.cve.org", "cveawg.mitre.org",
+ "oauth2.googleapis.com", "gmail.googleapis.com",
+ // Added with the `lychee` link-check prek hook: the hosts the
+ // framework's own docs link to (so lychee passes in-sandbox)
+ // plus `*.crates.io` (so the rust hook can `cargo install` lychee).
+ "*.crates.io", "*.apache.org", "*.anthropic.com", "*.claude.com",
+ "*.mitre.org", "*.nist.gov", "*.github.io", "gist.github.com",
+ "astral.sh", "json.schemastore.org", "lychee.cli.rs", "sdkman.io"
+ ],
+ // Lets native-TLS CLI tools (lychee — and, per the schema, gh /
+ // gcloud / terraform) verify TLS through the sandbox's
+ // TLS-terminating proxy; without it lychee fails every external
+ // link with `failed to verify TLS certificate`. Documented
+ // trade-off: "reduces security — opens a potential
+ // data-exfiltration vector through the trustd service." No-op
+ // outside the sandbox (e.g. CI). macOS-only.
+ "enableWeakerNetworkIsolation": true
}
},
"permissions": {
diff --git a/skills/setup-isolated-setup-update/SKILL.md
b/skills/setup-isolated-setup-update/SKILL.md
index a478cd84..086889a9 100644
--- a/skills/setup-isolated-setup-update/SKILL.md
+++ b/skills/setup-isolated-setup-update/SKILL.md
@@ -175,6 +175,30 @@ Walk each:
`hooks.PreToolUse` entry** (matcher `Bash`) if the user wired
the secure setup before the guard shipped. Report new entries
the user does not have; do not auto-merge.
+
+ Two network-layer defaults landed with the `lychee` link-check
+ prek hook — surface both if the user's settings predate them
+ (both `sandbox.network.*`):
+
+ - **Broadened `allowedDomains`.** The dogfooded default now
+ allows the curated set the framework's own docs and dev tools
+ reach — `*.crates.io` (so the rust `lychee` hook can
+ `cargo install` lychee), `*.apache.org`, `*.anthropic.com`,
+ `*.claude.com`, `*.mitre.org`, `*.nist.gov`, `*.github.io`,
+ `gist.github.com`, `astral.sh`, `json.schemastore.org`,
+ `lychee.cli.rs`, `sdkman.io`. Without these, lychee fails the
+ PR-blocking `prek` check locally on first run.
+ - **`enableWeakerNetworkIsolation: true`.** Required for
+ native-TLS CLI tools (lychee, and the same mechanism the
+ schema notes for `gh` / `gcloud` / `terraform`) to verify TLS
+ through the sandbox's TLS-terminating proxy — without it lychee
+ fails every external link with `failed to verify TLS
+ certificate`. **Surface the documented trade-off when
+ reporting it**: the schema warns it "reduces security — opens a
+ potential data-exfiltration vector through the trustd service,"
+ so the user decides whether to enable it (the default ships it
+ on because the link check needs it). It is a no-op outside the
+ sandbox, e.g. in CI.
5. **comdev MCP checkouts (`ponymail`, `apache-projects`).** These
ASF MCP servers are installed from a local `apache/comdev`
checkout and are **tracked at `main`, not pinned** — unlike the
diff --git a/tools/sandbox-lint/expected.json b/tools/sandbox-lint/expected.json
index a4793dea..63fa217f 100644
--- a/tools/sandbox-lint/expected.json
+++ b/tools/sandbox-lint/expected.json
@@ -42,8 +42,21 @@
"www.cve.org",
"cveawg.mitre.org",
"oauth2.googleapis.com",
- "gmail.googleapis.com"
- ]
+ "gmail.googleapis.com",
+ "*.crates.io",
+ "*.apache.org",
+ "*.anthropic.com",
+ "*.claude.com",
+ "*.mitre.org",
+ "*.nist.gov",
+ "*.github.io",
+ "gist.github.com",
+ "astral.sh",
+ "json.schemastore.org",
+ "lychee.cli.rs",
+ "sdkman.io"
+ ],
+ "enableWeakerNetworkIsolation": true
}
},
"permissions": {