This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git
The following commit(s) were added to refs/heads/main by this push:
new 1ced54f feat(spec-loop): freshness check + branch-name collision
check (#297)
1ced54f is described below
commit 1ced54f009a324db84c53269e5bb99daef0a4716
Author: Justin Mclean <[email protected]>
AuthorDate: Wed May 27 07:55:55 2026 +1000
feat(spec-loop): freshness check + branch-name collision check (#297)
* feat(spec-loop): add spec-driven build loop and product specs
Introduce a spec-driven development loop under tools/spec-loop/ that
develops and maintains the framework against a written description of
what it does.
- specs/ describes the actual functionality (the four live modes, the
security lifecycle, privacy gate, sandbox, CVE tooling, adoption,
adapters, meta tooling) as topic-named files, separate from the RFCs.
- IMPLEMENTATION_PLAN.md holds the prioritised gaps as work items.
- loop.sh runs plan / build / update / consolidate. Build implements one
work item per branch (one work item = one branch = one PR) and never
pushes or opens a PR; the human does that. update back-fills specs from
functionality contributed outside the loop.
- docs/spec-driven-development.md explains how it works.
Generated-by: Claude (Opus 4.7)
* docs(skills): require an eval suite for every skill
Add to AGENTS.md § Reusable skills the convention that every skill ships
a behavioural eval suite under tools/skill-evals/evals/<skill-name>/
covering each pipeline step with fixture cases — a skill PR without a
matching eval is incomplete. Mirror the rule into the spec-loop build
prompt and operational context so the loop enforces it, and record the
"back-fill missing eval suites" gap in the plan.
Generated-by: Claude (Opus 4.7)
* fix(spec-loop): default loop base to the current branch, not spec-driven
SPEC_LOOP_BASE defaulted to spec-driven — the throwaway scaffolding
branch — which is a dangling reference the moment this PR merges. Default
it instead to the branch the loop is started on (fall back to main if
detached), and point the work-item PR examples at main.
Generated-by: Claude (Opus 4.7)
* fix(spec-loop): satisfy markdownlint and doctoc
- Tag the push/PR command fences in PROMPT_build.md and PROMPT_update.md
with `text` so markdownlint MD040 (fenced-code-language) passes.
- Exclude tools/spec-loop/ from the doctoc hook: the specs carry YAML
frontmatter and the prompts are short single-purpose docs — same
rationale as the existing skill and skill-evals exclusions.
Generated-by: Claude (Opus 4.7)
* docs(spec-loop): explain --dangerously-skip-permissions in the security
model
The loop runs the agent headless with --dangerously-skip-permissions.
Document how that fits the layered sandbox: the flag bypasses the agent
permission layer (.claude/settings.json deny/ask) but NOT the OS sandbox
(clean-env + filesystem/network), which stays the real boundary — matching
the flag's own "sandboxes only" guidance.
- Add a "Security and the dangerously-skip-permissions flag" section to
docs/spec-driven-development.md (what each sandbox layer does, why the
loop stays safe: run-in-sandbox, no credentials, structural containment).
- Add a Security callout to tools/spec-loop/README.md and a SECURITY
header block to loop.sh.
- Harden the invocation: --disallowedTools "Bash(git push *)" "Bash(gh *)"
as defence in depth so a stray push/PR cannot reach the remote.
Generated-by: Claude (Opus 4.7)
* fix(spec-loop): make plan consolidation hysteretic and not livelock
The build loop switched to a consolidation round whenever the plan
exceeded 500 lines, but the consolidate beat must preserve every planned
work item. A plan that was long because of pending work (not stale
history) could therefore never drop below the threshold, so every
subsequent build iteration re-consolidated forever and never built.
- Consolidate at most once (CONSOLIDATE_TRIED latch); if still over after
one round, build anyway and note that the length is planned work. The
latch resets when the plan drops back under the limit.
- Give PROMPT_consolidate a real target (~300 lines, below the 500
trigger) for hysteresis, while preserving every planned work item.
- Make the threshold configurable via SPEC_LOOP_PLAN_MAX (default 500)
and document it.
Generated-by: Claude (Opus 4.7)
* loop improvements
* so we can run this before merging
* use git checkout not git swicth so it works with slightly older versions
of git
* read tooling from control branch, build work items on main
* remove "spec/" on branches
* fix(spec-loop): correct update-mode note, arg validation, spinner, deny
syntax
Address self-review findings on loop.sh:
- update mode no longer receives the build-only "do NOT edit specs" note
when launched from a branch != BASE; it is now told to author the updated
specs on the work branch (update is the one beat that writes specs).
- reject a non-numeric iteration count instead of erroring to stderr and
silently treating it as 0 (i.e. running unbounded).
- spinner indexes the braille frames as an array rather than by byte offset,
so it renders correctly under a C/POSIX locale (the clean-env sandbox
case).
- correct --disallowedTools to the colon wildcard form
(Bash(git push:*) / Bash(gh:*)) so the defense-in-depth deny actually
matches git push / gh invocations.
Generated-by: Claude Code (Opus 4.7)
* feat(spec-loop): freshness check + branch-name collision check
Two pre-flight checks that close the gap that let an iteration re-do
merged work under the same branch name as the already-shipped PR:
- **Freshness check** (before `git checkout $BASE`): fetch the base's
upstream tracking branch and refuse to fork if BASE is behind.
Forking off a stale local main is how an agent re-creates a file
that already exists upstream — the resulting branch then collides
with the merged PR's source branch on the remote.
- **Branch-name collision check** (after the agent forks its branch):
walk every configured remote and warn if any already has a branch
with the same name. Covers fork-based workflows where the merged
PR's source branch lingers on `origin` and the canonical history
lives on `upstream`.
Both checks degrade gracefully when offline or when BASE has no
configured upstream — they warn-and-skip rather than block.
Generated-by: Claude Code (Opus 4.7)
---
.pre-commit-config.yaml | 6 +-
AGENTS.md | 15 +-
docs/spec-driven-development.md | 249 +++++++++++++++
tools/spec-loop/AGENTS.md | 78 +++++
tools/spec-loop/IMPLEMENTATION_PLAN.md | 95 ++++++
tools/spec-loop/PROMPT_build.md | 76 +++++
tools/spec-loop/PROMPT_consolidate.md | 28 ++
tools/spec-loop/PROMPT_plan.md | 44 +++
tools/spec-loop/PROMPT_update.md | 59 ++++
tools/spec-loop/README.md | 87 +++++
tools/spec-loop/loop.sh | 373 ++++++++++++++++++++++
tools/spec-loop/specs/README.md | 74 +++++
tools/spec-loop/specs/adapters.md | 77 +++++
tools/spec-loop/specs/adoption-and-setup.md | 77 +++++
tools/spec-loop/specs/agent-isolation-sandbox.md | 84 +++++
tools/spec-loop/specs/cve-tooling.md | 74 +++++
tools/spec-loop/specs/drafting-mode.md | 73 +++++
tools/spec-loop/specs/mentoring-mode.md | 75 +++++
tools/spec-loop/specs/meta-and-quality-tooling.md | 81 +++++
tools/spec-loop/specs/overview.md | 68 ++++
tools/spec-loop/specs/pairing-mode.md | 72 +++++
tools/spec-loop/specs/privacy-llm-gate.md | 84 +++++
tools/spec-loop/specs/security-issue-lifecycle.md | 76 +++++
tools/spec-loop/specs/triage-mode.md | 74 +++++
24 files changed, 2096 insertions(+), 3 deletions(-)
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 887e9af..589361b 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -42,11 +42,13 @@ repos:
# Skip agent-facing skill definitions: their YAML frontmatter must be
# the first thing in the file, which is incompatible with doctoc
# inserting a TOC block at the top. Also skip skill-evals fixture and
- # README files — short, single-purpose docs that don't warrant a TOC.
+ # README files, and the spec-loop specs/prompts (YAML-frontmatter
+ # specs and short single-purpose prompts, same rationale as skills)
+ # — short docs that don't warrant a TOC.
# Skip the PR template — GitHub pre-populates a new PR description
# with the template verbatim, so a TOC block becomes per-PR noise the
# contributor has to delete by hand.
- exclude:
^(\.claude/skills/.*|tools/vulnogram/generate-cve-json/SKILL\.md|tools/skill-evals/.*|\.github/PULL_REQUEST_TEMPLATE\.md)$
+ exclude:
^(\.claude/skills/.*|tools/vulnogram/generate-cve-json/SKILL\.md|tools/skill-evals/.*|tools/spec-loop/.*|\.github/PULL_REQUEST_TEMPLATE\.md)$
args:
- "--maxlevel"
- "3"
diff --git a/AGENTS.md b/AGENTS.md
index 8fede15..af00e3f 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1389,7 +1389,20 @@ When adding a new skill:
- start with YAML frontmatter containing `name`, `description`, and
`when_to_use`;
- make every state-changing action a *proposal* that requires explicit user
confirmation before it runs;
-- avoid agent-specific syntax so the skill remains portable across tools.
+- avoid agent-specific syntax so the skill remains portable across tools;
+- **write an eval suite for the skill.** Every skill ships with a
+ behavioural eval under
+ [`tools/skill-evals/evals/<skill-name>/`](tools/skill-evals/) that
+ exercises each pipeline step with fixture cases and pins the expected
+ structured output — the same shape as the existing suites
+ (`security-issue-import`, `issue-triage`, …). Evals are not optional
+ polish: an LLM-driven skill's behaviour is a distribution, and the eval
+ is how a reviewer (and CI) confirms a change did not regress it. Run a
+ suite with
+ `uv run --project tools/skill-evals skill-eval
tools/skill-evals/evals/<skill-name>/`;
+ see [`tools/skill-evals/README.md`](tools/skill-evals/README.md) for the
+ layout (`step-*/fixtures/case-*`). A skill PR without a matching eval
+ suite is incomplete.
## Keeping evals and mode-economics in sync
diff --git a/docs/spec-driven-development.md b/docs/spec-driven-development.md
new file mode 100644
index 0000000..b69f706
--- /dev/null
+++ b/docs/spec-driven-development.md
@@ -0,0 +1,249 @@
+<!-- START doctoc generated TOC please keep comment here to allow auto update
-->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents** *generated with
[DocToc](https://github.com/thlorenz/doctoc)*
+
+- [Spec-driven development](#spec-driven-development)
+ - [The idea](#the-idea)
+ - [Three phases, four beats, one loop](#three-phases-four-beats-one-loop)
+ - [Specs are not RFCs](#specs-are-not-rfcs)
+ - [A branch per fix or feature](#a-branch-per-fix-or-feature)
+ - [Why it never pushes](#why-it-never-pushes)
+ - [Security and the dangerously-skip-permissions
flag](#security-and-the-dangerously-skip-permissions-flag)
+ - [Keeping specs honest: the update
beat](#keeping-specs-honest-the-update-beat)
+ - [Layout](#layout)
+ - [Quick start](#quick-start)
+ - [How this composes with the framework's
principles](#how-this-composes-with-the-frameworks-principles)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# Spec-driven development
+
+This document explains the **spec-driven build loop** that lives in
+[`tools/spec-loop/`](../tools/spec-loop/). It is how this framework can be
+developed and maintained against a written description of what it does,
+rather than against whatever happens to be in someone's head on a given
+afternoon.
+
+## The idea
+
+The loop is a small instance of the general
+[Ralph](https://ghuntley.com/ralph/) technique: run a fresh agent context
+against a fixed prompt, let it do one well-scoped thing, then run it
+again with a clean context. The power is in the funnel that feeds it —
+not "a loop that codes," but a pipeline from *what the product should do*
+to *one reviewable change at a time*.
+
+Two artefacts carry the state between iterations, so nothing depends on a
+long-lived context window:
+
+- **Specs** ([`tools/spec-loop/specs/`](../tools/spec-loop/specs/)) — a
+ faithful, plain-Markdown description of each functional area of the
+ product: what it does, where it lives in the code, the contract it must
+ honour, and its known gaps. The specs are the *desired state*.
+- **The implementation plan**
+
([`tools/spec-loop/IMPLEMENTATION_PLAN.md`](../tools/spec-loop/IMPLEMENTATION_PLAN.md))
+ — the prioritised list of *work items*: the gaps between the specs and
+ the code. Each work item is sized to one branch and one PR.
+
+## Three phases, four beats, one loop
+
+Phase one is a human (or a planning conversation) writing the specs.
+Phases two and three are the loop, swapping prompts:
+
+| Beat | Command | What it does | Commits? |
+|---|---|---|---|
+| **plan** | `loop.sh plan` | Compares specs against the code; rewrites the
plan as prioritised work items. | No |
+| **build** | `loop.sh` | Implements the single highest-priority work item on
its own branch; validates; commits there. | Yes, on a work-item branch |
+| **update** | `loop.sh update` | Inverse of plan: finds functionality the
code has but the specs don't, and brings the specs back in sync. | Yes, on
`spec/sync-specs` |
+| **consolidate** | `loop.sh consolidate` | Shrinks the plan when it grows too
long, without dropping planned work. | Yes, the plan only |
+
+Every beat loads the same operational context —
+[`tools/spec-loop/AGENTS.md`](../tools/spec-loop/AGENTS.md) (repo map,
+validation commands, branch and hard-limit rules) layered on top of the
+repository-wide [`AGENTS.md`](../AGENTS.md). Each build iteration reads
+only the spec and source files relevant to its one work item, so a fresh
+context never drowns in the whole tree.
+
+## Specs are not RFCs
+
+The framework already has [`docs/rfcs/`](rfcs/) — normative principle
+documents that say *why* and define the trust posture. Those are the
+**constitution**. The specs are the **work orders**: discrete, concrete,
+grounded in the actual code. The loop **respects** an RFC principle as a
+constraint (it never pushes, it stays in the sandbox), but it never reads,
+edits, or restates an RFC, and no spec ever lives under `docs/rfcs/`. The
+two formats and lifecycles are deliberately separate; see
+[`tools/spec-loop/specs/README.md`](../tools/spec-loop/specs/README.md).
+
+Spec filenames are **topics, not numbers** — `triage-mode.md`,
+`pairing-mode.md`, `security-issue-lifecycle.md`. There is no numeric
+prefix, because numbering implies a priority the specs don't carry.
+Priority lives only in the implementation plan.
+
+## A branch per fix or feature
+
+The defining constraint of this loop: **one work item, one branch, one
+PR**. Before plan/build iterations, the runner snapshots open PRs so the
+plan beat does not add work already in flight and the build beat skips
+planned items that an open PR already covers. The build beat returns to
+the integration base, then carves out a
+`spec/<slug>` branch for the single work item it is about to implement. It
+never commits feature work to the base branch, and `loop.sh` stops if it
+detects that happening. The result of a run is a fan of independent
+branches, each carrying exactly one change — each independently
+reviewable and independently revertible.
+
+This is the same discipline the framework asks of every state change,
+applied to the framework's own development. It is also what makes the
+loop safe to run unattended: the blast radius of any one iteration is a
+single local branch.
+
+## Why it never pushes
+
+`git push` and `gh pr create` are in the `ask` list of
+[`.claude/settings.json`](../.claude/settings.json) — they require a human
+confirmation. The loop honours that: it ends every iteration at a **local
+commit** and prints the exact commands for the human to run:
+
+```bash
+git push -u origin spec/<slug>
+gh pr create --web --base main --head spec/<slug> \
+ --title "<subject>" --body-file <prepared-body>
+```
+
+Opening the PR with `--web` is the framework's convention so the reviewer
+sees the title, body, and generative-AI disclosure in the browser before
+submitting. The agent drafts; the human presses the button.
+
+## Security and the dangerously-skip-permissions flag
+
+The loop runs the agent headless with `--dangerously-skip-permissions`.
+That deserves a direct explanation, because it looks, at a glance, like
+it throws away the framework's permission gates.
+
+**Why the flag is there.** Headless iterations have no human to answer a
+per-tool-call prompt. Without the flag, the agent would stall (or, in
+non-interactive mode, deny) the moment it tried to edit a file or run a
+validation command. The flag lets the loop do its job — edit, validate,
+commit — unattended.
+
+**What it bypasses, and what it does not.** The framework's sandbox is
+layered (see [`docs/rfcs/RFC-AI-0004.md`](rfcs/RFC-AI-0004.md) for the
+normative statement and `docs/setup/secure-agent-internals.md` for the
+mechanism). `--dangerously-skip-permissions` only reaches the top two:
+
+| Layer | Mechanism | Bypassed by the flag? |
+|---|---|---|
+| 0. Clean environment | wrapper strips credential-shaped env vars before exec
| **No** — it is the launching wrapper, not the agent |
+| 1. Filesystem + network sandbox | `bubblewrap` + SNI proxy (Linux) /
`sandbox-exec` (macOS); default-deny egress | **No** — enforced by the OS, not
the agent |
+| 2. Tool permissions | `.claude/settings.json` `permissions.deny` | **Yes** |
+| 3. Forced confirmation | `.claude/settings.json` `permissions.ask` on `git
push`, `gh …` | **Yes** |
+
+So the flag removes the *agent-level* gate (Layers 2–3), but the
+*OS-level* boundary (Layers 0–1) is untouched — it is enforced beneath
+the agent and cannot be turned off from inside it. This is exactly the
+posture the flag's own guidance assumes: it is *"recommended only for
+sandboxes with no internet access."*
+
+**How the loop stays safe anyway.** Three things, in order of
+importance:
+
+1. **Run it only inside the sandbox harness.** The OS layers the flag
+ cannot bypass are the real boundary. Never run the loop on a bare
+ machine — launch it through the project's `claude-iso`/sandbox wrapper
+ so the filesystem and network allow-lists are in force.
+2. **Run it with no push/write credentials in the environment.** The
+ clean-env wrapper already strips them; keep it that way. `github.com`
+ is on the network allow-list, but a `git push` or `gh pr create` with
+ no token cannot authenticate, so it fails closed. As defence in depth
+ the loop also passes
+ `--disallowedTools "Bash(git push *)" "Bash(gh *)"`.
+3. **Structural containment.** Every iteration works on its own
+ `spec/<slug>` branch, the loop guards against commits landing on the
+ base branch, and the prompts forbid push/PR. The human-in-the-loop
+ gate is not removed — it is *relocated* from per-tool-call to the
+ push / PR / merge boundary, where the human reviews a finished branch.
+
+**Net effect.** During a run the per-call confirmation gate is traded for
+autonomy, but credentials are absent, egress is fenced, and the blast
+radius of any iteration is a single local branch the human has not yet
+pushed. That is the same reason the loop is the project's *manual-loop
+evidence* and must never be promoted to auto-merge: the autonomy is
+bounded to producing local branches, nothing more. An operator who wants
+the per-call gate back can drop the flag and pre-authorise the loop's
+tools with `--allowedTools` instead — at the cost of the loop pausing on
+anything it was not pre-authorised to do.
+
+## Keeping specs honest: the update beat
+
+Not every contribution comes through the loop — people land new skills and
+tools the normal way. When that happens the specs fall behind the code.
+The **update** beat is the fix: it inventories `.claude/skills/`,
+`tools/`, and `docs/modes.md`, diffs that against the specs, and back-fills
+or corrects the specs (a `proposed` area that now has a shipped skill
+becomes `experimental`; a drifted *Where it lives* is corrected; genuinely
+new functionality gets a new topic-named spec). It edits **only** the spec
+directory — it documents reality, it doesn't change it — and lands as one
+reviewable `spec/sync-specs` PR. Run it after a batch of normal PRs merges,
+or on a schedule.
+
+## Layout
+
+```text
+tools/spec-loop/
+├── README.md operator quickstart
+├── AGENTS.md loop-scoped operational context
+├── loop.sh the runner (plan / build / update / consolidate)
+├── PROMPT_plan.md gap analysis → plan
+├── PROMPT_build.md implement one work item on its own branch
+├── PROMPT_update.md back-fill specs from contributed code
+├── PROMPT_consolidate.md shrink the plan
+├── IMPLEMENTATION_PLAN.md prioritised work items (the gaps)
+└── specs/ functional description of the product
+ ├── overview.md
+ ├── triage-mode.md mentoring-mode.md drafting-mode.md
pairing-mode.md
+ ├── security-issue-lifecycle.md privacy-llm-gate.md
+ ├── agent-isolation-sandbox.md cve-tooling.md
+ ├── adoption-and-setup.md adapters.md
+ └── meta-and-quality-tooling.md
+```
+
+## Quick start
+
+```bash
+# 1. See what's out of sync, then read the plan it writes.
+./tools/spec-loop/loop.sh plan 1
+$EDITOR tools/spec-loop/IMPLEMENTATION_PLAN.md
+
+# 2. Build the top work item (one branch, one commit) and stop.
+./tools/spec-loop/loop.sh 1
+
+# 3. Review the branch it produced, then push + open the PR yourself.
+git log --oneline -1
+git push -u origin spec/<slug>
+gh pr create --web --base main --head spec/<slug> --title "…" --body-file …
+
+# Later: someone merged skills outside the loop — resync the specs.
+./tools/spec-loop/loop.sh update 1
+```
+
+Stop any run with `Ctrl+C` or `touch STOP`. By default the loop forks
+work items from the branch you start it on (typically `main`); set
+`SPEC_LOOP_BASE` to build on top of a different branch. Set
+`SPEC_LOOP_AGENT` when the Claude-compatible agent CLI is installed
+under a command name other than `claude`. Set `SPEC_LOOP_PR_LIMIT` to
+change how many open PRs are included in the duplicate-work check.
+
+## How this composes with the framework's principles
+
+A loop that runs an agent unattended sounds, at first, like the opposite
+of human-in-the-loop. The branch-per-feature constraint is the
+reconciliation: the loop's autonomy is bounded to *producing local
+branches*, and the human gate sits exactly where the framework always
+puts it — at push, at PR, at merge. Nothing the loop does is visible
+outside the maintainer's machine until a human chooses to push it. The
+loop is the *manual* development cycle the framework can later point to as
+evidence; it is not, and must not become, an auto-merge.
diff --git a/tools/spec-loop/AGENTS.md b/tools/spec-loop/AGENTS.md
new file mode 100644
index 0000000..7a839b8
--- /dev/null
+++ b/tools/spec-loop/AGENTS.md
@@ -0,0 +1,78 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# AGENTS — spec-loop operational context
+
+This file is the **operational** context for the spec-loop only:
+build/validate commands, the repository map, and the branch rules. It is
+loaded by the loop's prompts in addition to the repository-wide
+[`/AGENTS.md`](../../AGENTS.md), which still governs everything (commit
+trailers, placeholder convention, privacy/security posture). Where the
+two overlap, the repo-wide `AGENTS.md` wins.
+
+## Repository map (what the loop edits)
+
+This repo has no `src/` tree. Work lands in one of:
+
+- `.claude/skills/<name>/SKILL.md` — agent-readable skills (Markdown +
+ YAML frontmatter; required keys `name`, `description`, `license`).
+- `tools/<tool>/` — deterministic Python tools (`uv`, hatchling,
+ `src/` + `tests/`, `dependencies = []` where possible).
+- `docs/` — human-facing documentation. `docs/rfcs/` is the **separate**
+ governance layer — the loop never edits it.
+- `tools/spec-loop/specs/` — the specs this loop consumes.
+
+## Validation commands (the build "backpressure" step)
+
+Run the spec's own **Validation** block first. General checks:
+
+```bash
+# Validate skill definitions (frontmatter, links, placeholders)
+uv run --project tools/skill-validator --group dev skill-validate
+
+# A skill's behavioural eval suite (every skill must have one)
+uv run --project tools/skill-evals skill-eval
tools/skill-evals/evals/<skill-name>/
+
+# A tool's own tests (substitute the tool path)
+uv run --project tools/<tool> --group dev pytest
+
+# Shell scripts
+bash -n <script>.sh && shellcheck <script>.sh
+```
+
+There is no repo-wide test runner; validate the specific surface the
+spec touches. If a work item adds or changes a **skill**, it must also
+add/extend that skill's eval suite under
+`tools/skill-evals/evals/<skill-name>/` (per `/AGENTS.md` § Reusable
+skills — a skill without an eval suite is incomplete). If a work item
+adds a **tool**, that tool ships its own tests. Both must pass before
+commit.
+
+## Branch rules (the user's constraint: one branch per fix/feature)
+
+- **Never commit feature work to the integration branch.** Build mode
+ branches `<slug>` off the integration branch (`$SPEC_LOOP_BASE`,
+ default: `main`) first.
+- **One spec per branch, one branch per PR.** Do not bundle specs.
+- A feature branch edits only **its own** spec's `status:` (→ `done`) —
+ not sibling specs and not `IMPLEMENTATION_PLAN.md` (avoids cross-branch
+ conflicts; the plan is reconciled by a later `plan` pass).
+- The **`update`** beat (specs fell behind code others contributed)
+ branches `sync-specs` and edits `specs/` **only** — it documents
+ reality, it never changes a skill, tool, or doc outside the spec dir.
+
+## Hard limits (governance — do not cross)
+
+- **No push, no PR.** `git push` and `gh pr create` are in the `ask`
+ list of `.claude/settings.json`. The loop stops at a local commit and
+ prints the human-run commands. Opening the PR is the human's click.
+- **No `.claude/settings.json` edits** (it is in the `deny` list).
+- **No new network/filesystem allowances.** Run inside the existing
+ sandbox.
+
+## Commits
+
+- Imperative subject describing the user-visible change.
+- Trailer `Generated-by: Claude (Opus 4.7)` — **never** `Co-Authored-By`
+ with an agent (repo-wide `AGENTS.md` § Commit and PR conventions).
+- One commit per build iteration (the change + its spec `status` flip).
diff --git a/tools/spec-loop/IMPLEMENTATION_PLAN.md
b/tools/spec-loop/IMPLEMENTATION_PLAN.md
new file mode 100644
index 0000000..db69e0a
--- /dev/null
+++ b/tools/spec-loop/IMPLEMENTATION_PLAN.md
@@ -0,0 +1,95 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# Implementation Plan — spec-loop
+
+Maintained by the loop's **plan** mode. It is the prioritised list of
+*gaps* found by comparing [`specs/`](specs/) against the actual code
+(`.claude/skills/`, `tools/`, `docs/`). The **build** mode takes the
+single highest-priority work item, isolates it on its own branch,
+implements it, validates it, and commits — **one work item, one branch,
+one PR** (the branch-per-feature constraint).
+
+> Priority lives here, not in the specs. The specs describe functional
+> areas (unordered); this plan orders the work.
+
+---
+
+## What's been built
+
+- **Spec set** — [`specs/`](specs/): an `overview` plus a functional
+ spec per area (the four live modes, the security lifecycle, the
+ privacy-LLM gate, the sandbox, CVE tooling, adoption/setup, adapters,
+ and meta/quality tooling).
+- **Loop scaffolding** — `loop.sh` (plan / build / consolidate; a branch
+ per work item; never pushes), `PROMPT_plan.md`, `PROMPT_build.md`,
+ `PROMPT_consolidate.md`, `AGENTS.md` (loop-scoped operational context),
+ and this plan.
+
+---
+
+## Work items (planned)
+
+Priority order. Each maps to one branch and one PR. Branch names are
+slugs, not numbers (numbering implies an order the specs don't carry).
+
+1. **Pairing — pre-flight self-review skill.** Highest priority: closes
+ the empty-Pairing-family gap MISSION makes a v1 goal. New
+ `.claude/skills/pairing-self-review/SKILL.md` (read-only, hands a
+ report back, never opens a PR); update `docs/modes.md` Pairing row
+ 0 → 1, `proposed` → `experimental`. Validate with `skill-validate`.
+ Spec: [`specs/pairing-mode.md`](specs/pairing-mode.md). Branch
+ `pairing-self-review`.
+
+2. **Mentoring — first prototype skill.** `pr-management-mentor` (working
+ name), `mode: Mentoring` + `experimental`, drafting replies in a
+ teaching register with an explicit hand-off to a human. The Mentoring
+ spec/tone-guide already exists under `docs/mentoring/`. Spec:
+ [`specs/mentoring-mode.md`](specs/mentoring-mode.md). Branch
+ `mentoring-prototype`.
+
+3. **Docs — mode economics page.** New `docs/mode-economics.md` (per-mode
+ token-cost shape, vendor-neutral, indicative-not-a-quote), linked from
+ `docs/modes.md`. From MISSION § Affordability. Branch
+ `mode-economics-doc`.
+
+4. **Meta — spec-status index.** A deterministic `uv` tool (mirrors
+ `list-steward-skills`) that prints specs by status and a `--ready`
+ filter, so later build iterations choose the next work item
+ mechanically. Spec:
[`specs/meta-and-quality-tooling.md`](specs/meta-and-quality-tooling.md).
+ Branch `spec-status-index`.
+
+5. **Pairing — multi-agent review pipeline.** Fans a local diff through
+ independent review passes (correctness / security / conventions) and
+ merges the findings. Reuses the self-review report format, so it
+ follows work item 1. Branch `pairing-multi-agent-review`.
+
+6. **Drafting — generic (non-security) drafting.** Extend Drafting beyond
+ the security + general-issue cases to lint fixes, audit-tool findings,
+ and documentation holes (MISSION names these in scope). Larger; split
+ into per-source work items as it is picked up. Spec:
+ [`specs/drafting-mode.md`](specs/drafting-mode.md). Branch
+ `generic-drafting`.
+
+7. **Meta — back-fill missing skill eval suites.** Per `/AGENTS.md`
+ § Reusable skills, every skill ships an eval suite under
+ `tools/skill-evals/evals/<skill-name>/`. Several skills predate that
+ convention and have none. Add one suite per uncovered skill — one
+ branch per skill (or per family). Spec:
+ [`specs/meta-and-quality-tooling.md`](specs/meta-and-quality-tooling.md).
+ Branch `eval-<skill-name>`.
+
+ Also: when a build iteration creates a new skill, its eval suite is
+ part of that same work item — not a separate one.
+
+---
+
+## Notes & discoveries
+
+- The general Ralph-loop technique pushes after every iteration. That
+ step is intentionally **removed** here: `git push` and `gh pr create`
+ are in the repo's `ask` permission list and are the human's step.
+- Validation per work item lives in the relevant spec's **Validation**
+ section; the build prompt runs it as backpressure before committing.
+- Auto-merge is deliberately off and has no work items — building toward
+ it would skip the proof MISSION requires.
diff --git a/tools/spec-loop/PROMPT_build.md b/tools/spec-loop/PROMPT_build.md
new file mode 100644
index 0000000..e612e37
--- /dev/null
+++ b/tools/spec-loop/PROMPT_build.md
@@ -0,0 +1,76 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+You are running the **build** beat of the spec-driven loop for this
+repository. Implement exactly ONE work item, on its OWN branch.
+
+Context to load first:
+
+- `tools/spec-loop/AGENTS.md` — operational rules (repo map, validation
+ commands, branch + hard-limit rules). The repo-wide `/AGENTS.md` also
+ applies (commit trailers, placeholder convention, confidentiality).
+- `tools/spec-loop/IMPLEMENTATION_PLAN.md` — the prioritised work items.
+- The appended **Open pull-request context** block from the runner.
+- Only the spec(s) and source files relevant to the chosen work item —
+ do not read the whole tree.
+
+Steps:
+
+1. Read the appended **Open pull-request context**. Treat open PRs as
+ in-flight work. Pick the single highest-priority work item from
+ `IMPLEMENTATION_PLAN.md`. If a **Tooling source** block is appended
+ below, read the plan from the control branch as it shows
+ (`git show <ref>:tools/spec-loop/IMPLEMENTATION_PLAN.md`), not from the
+ working tree — the tree is on the integration base, which need not carry
+ the plan. Pick an item not already substantially covered by an open PR.
+ One only.
+2. **Create its branch off the integration base**, then switch to it:
+ `git checkout -b <slug>` where `<slug>` is the work item's branch — the
+ bare slug, **no `spec/` or other prefix** (e.g.
+ `pairing-self-review`). NEVER commit the work to the integration
+ branch. One branch per work item.
+3. Read only the relevant spec file(s) — from the control branch if a
+ **Tooling source** block is appended, otherwise from the working tree —
+ plus the relevant `.claude/skills/` / `tools/` / `docs/` files from the
+ working tree. Confirm what already exists before writing — do not assume.
+4. Implement the work item **completely** — no placeholders, no stubs.
+ Skills: follow the skill format (frontmatter `name` / `description` /
+ `license`, SPDX header, placeholder convention, every state change a
+ confirmed proposal) **and ship an eval suite** under
+ `tools/skill-evals/evals/<skill-name>/` exercising each step with
+ fixture cases (per `/AGENTS.md` § Reusable skills — a skill without a
+ matching eval suite is incomplete). Tools: ship tests.
+5. Run the work item's **Validation** command(s) from its spec (the
+ backpressure). Fix until they pass.
+6. Specs and `IMPLEMENTATION_PLAN.md` live on the control branch. If a
+ **Tooling source** block is appended, they are **not** on this work
+ branch — do not create or edit them here; instead note any `status` or
+ `Known gap` change in the PR body for a later plan/update beat to
+ reconcile. (If no such block is present, the tooling is on this branch:
+ update **only that spec's** frontmatter/Known-gaps, and never
+ `IMPLEMENTATION_PLAN.md`.)
+7. `git add -A` then `git commit` with an imperative subject and a
+ `Generated-by: Claude (Opus 4.7)` trailer. **Never** add a
+ `Co-Authored-By:` trailer for an agent.
+
+Then STOP. Do NOT push and do NOT open a PR — `git push` and
+`gh pr create` are the human's step (they are in `.claude/settings.json`
+`ask`). Print the exact commands the human can run:
+
+```text
+git push -u origin <slug>
+gh pr create --web --base <integration-base> --head <slug> \
+ --title "<subject>" --body-file <prepared-body>
+```
+
+Rules:
+
+- One work item per iteration. Do not bundle.
+- Do not duplicate in-flight work from open PRs. If the highest-priority
+ plan item is covered by an open PR, skip it and choose the next
+ uncovered item.
+- If a work item is blocked, note why in its spec's `Known gaps` and pick
+ the next item instead.
+- Stay inside the sandbox; never edit `.claude/settings.json`; never add a
+ new network/filesystem allowance.
+- Single sources of truth — no duplicate logic; extend `tools/`.
diff --git a/tools/spec-loop/PROMPT_consolidate.md
b/tools/spec-loop/PROMPT_consolidate.md
new file mode 100644
index 0000000..3eebe4d
--- /dev/null
+++ b/tools/spec-loop/PROMPT_consolidate.md
@@ -0,0 +1,28 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+`tools/spec-loop/IMPLEMENTATION_PLAN.md` has grown too long. Consolidate
+it without losing planned work.
+
+1. Read `tools/spec-loop/IMPLEMENTATION_PLAN.md` in full.
+2. In **What's been built**: collapse each completed item to a single
+ concise line. The detail lives in the code and specs.
+3. In **Work items (planned)**: keep every work item intact — these still
+ guide future build beats. Do not remove or shorten planned work.
+4. Remove redundant notes, stale caveats, and duplicates.
+5. Rewrite the file. Aim for **under 300 lines** — comfortably below the
+ consolidation trigger so the loop does not immediately re-consolidate.
+ Shrink by collapsing the *What's been built* section only; **every
+ planned work item is preserved**. If planned work alone still exceeds
+ 300 lines, that is fine — do not pad, and never drop a work item to hit
+ the number.
+6. `git add -A` then
+ `git commit -m "chore(spec-loop): consolidate implementation plan"`
+ with a `Generated-by: Claude (Opus 4.7)` trailer.
+
+Rules:
+
+- Do not mark any planned work item as done.
+- Do not remove any planned work item.
+- Do not touch `tools/spec-loop/specs/` or any skill/tool/doc.
+- Commit only the plan file. Do not push or open a PR.
diff --git a/tools/spec-loop/PROMPT_plan.md b/tools/spec-loop/PROMPT_plan.md
new file mode 100644
index 0000000..afc4c57
--- /dev/null
+++ b/tools/spec-loop/PROMPT_plan.md
@@ -0,0 +1,44 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+You are running the **plan** beat of the spec-driven loop for this
+repository. Plan only — do NOT implement anything and do NOT commit code.
+
+Context to load first:
+
+- `tools/spec-loop/AGENTS.md` — operational rules (repo map, validation
+ commands, branch + hard-limit rules). The repo-wide `/AGENTS.md` also
+ applies.
+- `tools/spec-loop/specs/*` — the functional description of the product.
+- `tools/spec-loop/IMPLEMENTATION_PLAN.md` (if present; may be stale).
+- The appended **Open pull-request context** block from the runner.
+
+Steps:
+
+1. Study each spec in `tools/spec-loop/specs/` and compare it against the
+ actual code it names in **Where it lives** (`.claude/skills/`,
+ `tools/`, `docs/`). You may use parallel subagents for reading. Do NOT
+ assume something is missing — confirm with a code search first.
+2. Read the appended **Open pull-request context**. Treat open PRs as
+ in-flight work. If an apparent gap is already substantially covered by
+ an open PR (including draft PRs), do not add it as a planned work item.
+3. For each spec, identify the **gaps**: a `proposed` area with no skill,
+ a documented step that drifted from the code, a missing test, a
+ `Known gaps` item. Each gap is a candidate work item.
+4. Rewrite `tools/spec-loop/IMPLEMENTATION_PLAN.md` as a prioritised list
+ of work items. Each work item names: the change, the spec it serves,
+ its **Validation** command, and a branch slug (`<slug>`, the bare
+ slug — **no `spec/` or other prefix, no numbers**).
+5. Do NOT create work items against an `off` spec (e.g. Auto-merge) —
+ that would skip the proof MISSION requires.
+
+Rules:
+
+- Plan only. No edits to skills, tools, or docs. No commits in this beat.
+- Keep the plan prioritised and concise; one work item = one branch = one
+ PR.
+- Do not duplicate in-flight work from open PRs. If a stale existing plan
+ item is now covered by an open PR, remove it or mark it as in-flight
+ rather than leaving it available for the build beat.
+- Treat `tools/` as the standard library — prefer extending an existing
+ tool over a new ad-hoc one.
diff --git a/tools/spec-loop/PROMPT_update.md b/tools/spec-loop/PROMPT_update.md
new file mode 100644
index 0000000..be5c217
--- /dev/null
+++ b/tools/spec-loop/PROMPT_update.md
@@ -0,0 +1,59 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+You are running the **update** beat of the spec-driven loop. Specs can
+fall behind the code when contributors land functionality the normal
+way (a regular PR, not through this loop). This beat brings the specs
+back in sync with reality. It is the inverse of `plan`: `plan` finds code
+missing against specs; `update` finds **functionality missing against
+specs** and back-fills the specs.
+
+Context to load first:
+
+- `tools/spec-loop/AGENTS.md` and the repo-wide `/AGENTS.md`.
+- `tools/spec-loop/specs/*` — the current functional description.
+- The actual code: `.claude/skills/`, `tools/`, `docs/`, `docs/modes.md`.
+
+Steps:
+
+1. **Create the sync branch off the integration base**, then switch to
+ it: `git checkout -b sync-specs`. (One reviewable PR for the
+ sync.) Never commit the sync to the integration branch.
+2. Inventory the code with parallel subagents:
+ - every `.claude/skills/*/SKILL.md` (name, mode, what it does);
+ - every `tools/*` project (what it does, its tests);
+ - the mode/status table in `docs/modes.md`.
+3. Diff that inventory against `tools/spec-loop/specs/`:
+ - **New functionality with no spec** → author a new topic-named spec
+ (no number prefix) following the format in
+ [`specs/README.md`](specs/README.md), grounded in the real code it
+ describes.
+ - **Drifted spec** → a spec whose *Where it lives*, *Behaviour &
+ contract*, or `status` no longer matches the code → update it to
+ match reality (e.g. a `proposed` area that now has a shipped skill
+ becomes `experimental`/`stable`; skill counts in `docs/modes.md`
+ are reflected).
+ - **Removed functionality** → mark the spec or move it to a `Known
+ gaps`/retired note; do not silently delete history.
+4. Update `specs/overview.md` and `specs/README.md` indexes if areas were
+ added or renamed.
+5. `git add -A` then `git commit` with subject
+ `docs(spec-loop): sync specs with contributed functionality` and a
+ `Generated-by: Claude (Opus 4.7)` trailer.
+
+Then STOP. Do NOT push, do NOT open a PR. Print the human-run commands:
+
+```text
+git push -u origin sync-specs
+gh pr create --web --base <integration-base> --head sync-specs \
+ --title "Sync specs with contributed functionality" --body-file <body>
+```
+
+Rules:
+
+- **Edit specs only.** This beat changes `tools/spec-loop/specs/` (and
+ the indexes). It must NOT change any skill, tool, or doc outside the
+ spec directory — it documents reality, it does not alter it.
+- Confirm with a code search before recording something as present or
+ absent. Do not invent behaviour the code does not have.
+- Keep the RFCs untouched — they are a separate governance layer.
diff --git a/tools/spec-loop/README.md b/tools/spec-loop/README.md
new file mode 100644
index 0000000..8255be0
--- /dev/null
+++ b/tools/spec-loop/README.md
@@ -0,0 +1,87 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# spec-loop
+
+A spec-driven build loop for this framework, in the general
+[Ralph](https://ghuntley.com/ralph/) style (run a fresh agent context
+against a fixed prompt, repeat), adapted to the framework's
+human-in-the-loop posture. The full write-up is in
+[`docs/spec-driven-development.md`](../../docs/spec-driven-development.md);
+this is the operator quickstart.
+
+## The pieces
+
+| File | Role |
+|---|---|
+| [`specs/`](specs/) | The functional description of the product — one spec
per area. The desired state the loop reconciles code against. |
+| [`IMPLEMENTATION_PLAN.md`](IMPLEMENTATION_PLAN.md) | Prioritised **work
items** (the gaps). One work item = one branch = one PR. |
+| [`AGENTS.md`](AGENTS.md) | Loop-scoped operational rules (repo map,
validation commands, branch + hard-limit rules). |
+| `PROMPT_plan.md` / `PROMPT_build.md` / `PROMPT_update.md` /
`PROMPT_consolidate.md` | The per-beat prompts. |
+| `loop.sh` | The runner. |
+
+## Modes
+
+```bash
+./tools/spec-loop/loop.sh # build, unlimited iterations
+./tools/spec-loop/loop.sh 10 # build, max 10 iterations
+./tools/spec-loop/loop.sh plan # gap-analysis → rewrite the plan (no
code changes)
+./tools/spec-loop/loop.sh update # back-fill specs from functionality
others contributed
+./tools/spec-loop/loop.sh consolidate # shrink the plan when it grows too long
+```
+
+- **plan** — compares `specs/` against the code and rewrites
+ `IMPLEMENTATION_PLAN.md`. Plans only; no commits. It also checks open
+ PRs and does not add work items that are already in flight.
+- **build** — implements the single highest-priority work item on its own
+ `<slug>` branch, validates, and commits there. If the top plan
+ item is already covered by an open PR, it skips to the next uncovered
+ item.
+- **update** — the inverse of plan: scans the code for functionality not
+ yet described by a spec (someone contributed it the normal way) and
+ brings the specs back in sync, on a `sync-specs` branch.
+- **consolidate** — shrinks the plan without losing planned work (build
+ auto-switches to this when the plan grows past ~500 lines).
+
+## The two non-negotiables
+
+- **A branch per work item.** Build/update never commit to the
+ integration base; each carves out its own branch, so every change is
+ one reviewable, revertible PR.
+- **Never pushes, never opens a PR.** `git push` and `gh pr create` are in
+ `.claude/settings.json` `ask` — the human's step. Each beat ends at a
+ local commit and prints the exact push + `gh pr create --web` commands.
+
+## Security
+
+The loop runs the agent with `--dangerously-skip-permissions`, so it
+**must** be launched inside the project's sandbox harness, with no
+push/write credentials in the environment. The flag bypasses the agent
+permission layer (`.claude/settings.json` deny/ask) but **not** the OS
+sandbox (clean-env + filesystem/network), which stays the real boundary;
+as defence in depth the loop also hard-denies `git push` and `gh` via
+`--disallowedTools`. Full rationale:
+[`docs/spec-driven-development.md` § Security and the
dangerously-skip-permissions
flag](../../docs/spec-driven-development.md#security-and-the-dangerously-skip-permissions-flag).
+
+## Stop / configure
+
+- Stop: `Ctrl+C`, or `touch STOP` (exits after the current iteration).
+- `SPEC_LOOP_BASE` — branch to fork work items from. Defaults to `main`;
+ set it explicitly to build on top of a different branch.
+- `SPEC_LOOP_AGENT` — Claude-compatible agent CLI or wrapper to run
+ (default `claude`).
+- `SPEC_LOOP_MODEL` — model passed to the agent CLI (default `sonnet`).
+- `SPEC_LOOP_PR_LIMIT` — number of open PRs to include in duplicate-work
+ checks (default `100`).
+- `SPEC_LOOP_PLAN_MAX` — plan line count that triggers one consolidation
+ round before building (default `500`). The consolidate beat targets
+ ~300 lines (hysteresis) and runs at most once until the plan drops back
+ under the limit, so a plan that is long because of *pending work* never
+ re-consolidates in a loop.
+
+## Not the RFCs
+
+The specs are the *functional description of the code*. The
+[`docs/rfcs/`](../../docs/rfcs/) are the separate normative governance
+layer — the loop respects them as constraints and never reads or edits
+them.
diff --git a/tools/spec-loop/loop.sh b/tools/spec-loop/loop.sh
new file mode 100755
index 0000000..8e97a8c
--- /dev/null
+++ b/tools/spec-loop/loop.sh
@@ -0,0 +1,373 @@
+#!/usr/bin/env bash
+# SPDX-License-Identifier: Apache-2.0
+# https://www.apache.org/licenses/LICENSE-2.0
+#
+# Spec-driven build loop for this repository.
+#
+# A small loop in the general "Ralph" style (run a fresh agent context
+# against a prompt, repeat), adapted to this framework's posture:
+#
+# * THREE modes, ONE mechanism:
+# ./loop.sh build, unlimited iterations
+# ./loop.sh 20 build, max 20 iterations
+# ./loop.sh plan [N] gap-analysis only (updates the plan)
+# ./loop.sh update [N] back-fill specs from code others contributed
+# ./loop.sh consolidate [N] shrink the plan
+# * BRANCH PER WORK ITEM: before each build iteration the loop returns
+# to the integration base; the build prompt then carves out
+# <slug> for the one work item it implements. One work item =
+# one branch = one PR.
+# * NEVER pushes, NEVER opens a PR. `git push` / `gh pr create` are in
+# .claude/settings.json `ask` — they are the human's step. The loop
+# ends at a local commit and the build prompt prints the human-run
+# push + `gh pr create --web` commands.
+#
+# SECURITY — read before running:
+# This loop runs the agent with `--dangerously-skip-permissions`, which
+# bypasses the AGENT permission layer (.claude/settings.json deny/ask)
+# but NOT the OS sandbox (clean-env + filesystem/network). Per the
+# project's security model it MUST be launched inside the sandbox
+# harness, with no push/write credentials in the environment. Full
+# rationale: docs/spec-driven-development.md § Security and the
+# dangerously-skip-permissions flag.
+#
+# Stop gracefully: press Ctrl+C, or `touch STOP` (exits after the current
+# iteration finishes).
+#
+# Env overrides:
+# SPEC_LOOP_BASE branch to fork work items from (default: main)
+# SPEC_LOOP_AGENT Claude-compatible agent CLI to run (default: claude)
+# SPEC_LOOP_MODEL model passed to the agent CLI (default: sonnet)
+# SPEC_LOOP_PR_LIMIT open PRs to list for duplicate-work checks (default:
100)
+# SPEC_LOOP_PLAN_MAX plan line count that triggers ONE consolidation
+# round before building (default: 500)
+
+set -uo pipefail
+
+trap 'echo -e "\nStopping loop..."; exit 0' INT TERM
+
+# Operate from the repo root so the agent sees the whole tree.
+ROOT="$(git rev-parse --show-toplevel 2>/dev/null)" || {
+ echo "Error: not inside a git repository." >&2; exit 1; }
+cd "$ROOT" || exit 1
+
+LOOP_DIR="tools/spec-loop"
+PLAN="$LOOP_DIR/IMPLEMENTATION_PLAN.md"
+
+current_branch() {
+ local branch
+ branch="$(git rev-parse --abbrev-ref HEAD 2>/dev/null)" || return 1
+ if [ "$branch" = "HEAD" ]; then
+ return 0
+ fi
+ printf '%s\n' "$branch"
+}
+
+# Default the integration base to main — the repo's integration target —
+# regardless of which branch the loop is launched from. Work items fork
+# from here. Override with SPEC_LOOP_BASE to build on top of a different
+# branch.
+BASE="${SPEC_LOOP_BASE:-main}"
+# The control branch: where loop.sh, the prompts, the plan and the specs
+# live. Captured now, before the loop checks anything out, because a
+# build/update iteration checks out BASE (e.g. main) — which need not carry
+# the spec-loop tooling. Every tooling read below goes through this ref via
+# `git show`, so the loop works when launched from a feature branch while
+# building on main.
+TOOLING_REF="$(current_branch)"
+TOOLING_REF="${TOOLING_REF:-HEAD}"
+AGENT="${SPEC_LOOP_AGENT:-claude}"
+MODEL="${SPEC_LOOP_MODEL:-sonnet}"
+PR_LIMIT="${SPEC_LOOP_PR_LIMIT:-100}"
+# Plan length that triggers ONE consolidation round before building. The
+# consolidate beat preserves every planned work item, so a plan that is long
+# because of *pending work* (not stale history) cannot shrink below this —
+# hence the one-shot latch below, which avoids re-consolidating forever.
+PLAN_CONSOLIDATE_THRESHOLD="${SPEC_LOOP_PLAN_MAX:-500}"
+
+# ---- parse arguments -------------------------------------------------
+if [ "${1:-}" = "plan" ]; then
+ MODE="plan"; PROMPT_FILE="$LOOP_DIR/PROMPT_plan.md";
MAX_ITERATIONS="${2:-0}"
+elif [ "${1:-}" = "update" ]; then
+ MODE="update"; PROMPT_FILE="$LOOP_DIR/PROMPT_update.md";
MAX_ITERATIONS="${2:-0}"
+elif [ "${1:-}" = "consolidate" ]; then
+ MODE="consolidate"; PROMPT_FILE="$LOOP_DIR/PROMPT_consolidate.md";
MAX_ITERATIONS="${2:-0}"
+elif [[ "${1:-}" =~ ^[0-9]+$ ]]; then
+ MODE="build"; PROMPT_FILE="$LOOP_DIR/PROMPT_build.md";
MAX_ITERATIONS="$1"
+else
+ MODE="build"; PROMPT_FILE="$LOOP_DIR/PROMPT_build.md";
MAX_ITERATIONS=0
+fi
+
+# Reject a non-numeric iteration count. The plan/update/consolidate second
+# argument flows straight into the integer comparisons below, where a typo'd
+# value would otherwise error to stderr and be silently treated as 0 — i.e.
+# run unbounded instead of failing.
+if ! [[ "$MAX_ITERATIONS" =~ ^[0-9]+$ ]]; then
+ echo "Error: iteration count must be a non-negative integer, got
'${MAX_ITERATIONS}'." >&2
+ exit 1
+fi
+
+[ -f "$PROMPT_FILE" ] || { echo "Error: $PROMPT_FILE not found" >&2; exit 1; }
+
+if ! command -v "$AGENT" >/dev/null 2>&1; then
+ echo "Error: agent CLI '$AGENT' not found on PATH." >&2
+ echo "Set SPEC_LOOP_AGENT to a Claude-compatible CLI or wrapper." >&2
+ exit 1
+fi
+
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+echo "Mode: $MODE"
+echo "Prompt: $PROMPT_FILE"
+echo "Base: $BASE (work items fork from here)"
+echo "Agent: $AGENT"
+echo "Model: $MODEL"
+[ "$MAX_ITERATIONS" -gt 0 ] && echo "Max: $MAX_ITERATIONS iterations"
+echo "Stop: Ctrl+C or touch STOP"
+echo "Note: this loop never pushes and never opens a PR."
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+
+rm -f STOP
+ITERATION=0
+CONSOLIDATE_TRIED=false # one-shot latch; resets when the plan drops back
under the limit
+
+# spinner during silent agent calls
+spinner() {
+ local pid=$1 i=0
+ # Index frames as array elements, not string offsets: ${frames:$i:1} is
+ # byte-based under a C/POSIX locale (common in the clean-env sandbox) and
+ # would slice a multibyte braille glyph into garbage.
+ local frames=(⠋ ⠙ ⠹ ⠸ ⠼ ⠴ ⠦ ⠧ ⠇ ⠏)
+ while kill -0 "$pid" 2>/dev/null; do
+ printf "\r %s working... (Ctrl+C to stop)" "${frames[i]}"
+ i=$(( (i+1) % ${#frames[@]} )); sleep 0.15
+ done
+ printf "\r \r"
+}
+
+open_pr_context() {
+ echo ""
+ echo "## Open pull-request context"
+ echo ""
+ echo "The runner collected this immediately before the iteration. Treat
open"
+ echo "pull requests as in-flight work: do not add plan items that are
already"
+ echo "substantially covered by an open PR, and do not pick a build item
that"
+ echo "duplicates one."
+ echo ""
+
+ if ! command -v gh >/dev/null 2>&1; then
+ echo "- unavailable: gh CLI not found on PATH."
+ return 0
+ fi
+
+ local prs
+ prs="$(gh pr list \
+ --state open \
+ --limit "$PR_LIMIT" \
+ --json number,title,headRefName,baseRefName,url,isDraft \
+ --template '{{range .}}- #{{.number}} {{.title}} ({{.headRefName}} ->
{{.baseRefName}}){{if .isDraft}} [draft]{{end}} {{.url}}{{"\n"}}{{end}}' \
+ 2>/dev/null)"
+ if [ $? -ne 0 ]; then
+ echo "- unavailable: gh pr list failed. Check GitHub authentication or
network access."
+ elif [ -z "$prs" ]; then
+ echo "- No open pull requests found."
+ else
+ printf '%s\n' "$prs"
+ fi
+}
+
+while true; do
+ if [ "$MAX_ITERATIONS" -gt 0 ] && [ "$ITERATION" -ge "$MAX_ITERATIONS" ];
then
+ echo "Reached max iterations: $MAX_ITERATIONS"; break
+ fi
+ if [ -f STOP ]; then echo "STOP file detected — exiting."; rm -f STOP;
break; fi
+
+ ITERATION=$((ITERATION + 1))
+ echo ""
+ echo "━━━━━━━━━━━━━━━━━━━━ LOOP $ITERATION [ $(date '+%H:%M:%S') ]
━━━━━━━━━━━━━━━━━━━━"
+
+ ACTIVE_PROMPT="$PROMPT_FILE"
+
+ if [ "$MODE" = "build" ]; then
+ # Consolidate at most ONCE when the plan grows too long, then build
+ # even if it is still over: the remaining length is planned work
+ # items, which the consolidate beat preserves by design. The latch
+ # resets once the plan drops back under the limit (e.g. after items
+ # merge and a plan pass prunes them), so we never re-consolidate in a
+ # loop without making progress. Prefer the working-tree plan (so
+ # local edits count); fall back to the control branch ($TOOLING_REF)
+ # if the tree is on a base (e.g. main) that lacks the spec-loop
tooling.
+ PLAN_LINES=$( { [ -f "$PLAN" ] && cat "$PLAN" || git show
"$TOOLING_REF:$PLAN" 2>/dev/null; } | wc -l )
+ if [ "$PLAN_LINES" -le "$PLAN_CONSOLIDATE_THRESHOLD" ]; then
+ CONSOLIDATE_TRIED=false
+ elif [ "$CONSOLIDATE_TRIED" = false ]; then
+ echo " [plan] $PLAN is ${PLAN_LINES} lines (>
${PLAN_CONSOLIDATE_THRESHOLD}) — one consolidation round"
+ ACTIVE_PROMPT="$LOOP_DIR/PROMPT_consolidate.md"
+ CONSOLIDATE_TRIED=true
+ else
+ echo " [plan] $PLAN still ${PLAN_LINES} lines after consolidation
— length is planned work; building"
+ fi
+ fi
+
+ # A genuine build/update iteration (not a consolidate swap-in) carves a
+ # work-item branch off BASE. Plan/consolidate beats — and the consolidate
+ # swap-in — instead stay on the control branch, where the tooling lives.
+ BUILD_ITERATION=false
+ if { [ "$MODE" = "build" ] || [ "$MODE" = "update" ]; } && [
"$ACTIVE_PROMPT" = "$PROMPT_FILE" ]; then
+ BUILD_ITERATION=true
+ fi
+
+ # Assemble the prompt BEFORE any checkout, while the working tree is still
+ # on the control branch. Prefer the working-tree copy (local edits count);
+ # fall back to the control branch ($TOOLING_REF) if the tree is on a base
+ # that lacks the tooling. Either way the read never breaks after checkout.
+ PROMPT_WITH_CONTEXT="$(mktemp "${TMPDIR:-/tmp}/spec-loop-prompt.XXXXXX")"
|| exit 1
+ if [ -f "$ACTIVE_PROMPT" ]; then
+ cat "$ACTIVE_PROMPT" > "$PROMPT_WITH_CONTEXT"
+ elif ! git show "$TOOLING_REF:$ACTIVE_PROMPT" > "$PROMPT_WITH_CONTEXT"
2>/dev/null; then
+ echo "Error: could not read '$ACTIVE_PROMPT' from the working tree or
control branch '$TOOLING_REF'." >&2
+ rm -f "$PROMPT_WITH_CONTEXT"; break
+ fi
+ open_pr_context >> "$PROMPT_WITH_CONTEXT"
+
+ if [ "$BUILD_ITERATION" = true ]; then
+ # The work-item branch forks off BASE (e.g. main), which need not
+ # carry the spec-loop tooling. Tell the agent to read the plan and
+ # specs from the control branch, not the working tree.
+ if [ "$TOOLING_REF" != "$BASE" ]; then
+ {
+ echo ""
+ echo "## Tooling source — read the plan and specs from here"
+ echo ""
+ echo "This iteration builds on the integration base \`$BASE\`,
which does"
+ echo "NOT carry the spec-loop tooling. The plan and specs live
on the"
+ echo "control branch \`$TOOLING_REF\`. Read them from there
with \`git show\`,"
+ echo "never from the working tree:"
+ echo ""
+ echo '```'
+ echo "git show
$TOOLING_REF:tools/spec-loop/IMPLEMENTATION_PLAN.md"
+ echo "git ls-tree -r --name-only $TOOLING_REF
tools/spec-loop/specs/"
+ echo "git show $TOOLING_REF:tools/spec-loop/specs/<file>"
+ echo '```'
+ echo ""
+ if [ "$MODE" = "update" ]; then
+ echo "Read the current specs from \`$TOOLING_REF\`
(commands above) as the"
+ echo "baseline, then author the updated spec files on this
work branch —"
+ echo "the sync PR adds them to \`$BASE\`. Update is the
one beat that"
+ echo "writes specs; do that here, not on the control
branch."
+ else
+ echo "Implement the product change on the work branch; do
NOT edit specs"
+ echo "there — they are not on \`$BASE\`. The control
branch owns the specs."
+ fi
+ } >> "$PROMPT_WITH_CONTEXT"
+ fi
+
+ # Freshness check: refuse to fork off a stale base. Fetch the base's
+ # tracking branch and verify BASE is not behind it. Forking off a stale
+ # local base is how a loop re-does work that's already merged upstream:
+ # the agent sees "no such file" locally, dutifully re-creates it, and
+ # the resulting branch collides with the merged PR's source branch on
+ # the remote.
+ BASE_UPSTREAM="$(git rev-parse --abbrev-ref "${BASE}@{upstream}"
2>/dev/null || true)"
+ if [ -n "$BASE_UPSTREAM" ]; then
+ BASE_REMOTE="${BASE_UPSTREAM%%/*}"
+ if ! git fetch --quiet "$BASE_REMOTE" "$BASE" 2>/dev/null; then
+ echo "⚠ Could not fetch '$BASE_REMOTE' — freshness check
skipped (network or auth issue)." >&2
+ else
+ BEHIND_BY="$(git rev-list --count "${BASE}..${BASE_UPSTREAM}"
2>/dev/null || echo 0)"
+ if [ "$BEHIND_BY" -gt 0 ]; then
+ echo "✗ Base '$BASE' is $BEHIND_BY commit(s) behind
'$BASE_UPSTREAM'." >&2
+ echo " Fast-forward before re-running:" >&2
+ echo " git checkout $BASE && git merge --ff-only
$BASE_UPSTREAM" >&2
+ echo " Forking off a stale base re-does merged work — the
new branch" >&2
+ echo " may collide with one already on the remote for the
same change." >&2
+ rm -f "$PROMPT_WITH_CONTEXT"; break
+ fi
+ fi
+ else
+ echo "⚠ Base '$BASE' has no upstream tracking branch — freshness
check skipped." >&2
+ fi
+
+ # Check out the base now — right before the agent runs, not earlier —
+ # so the reads above came from the control branch. The agent then
+ # forks its own <slug> branch off this base.
+ if [ "$(current_branch)" != "$BASE" ]; then
+ if ! checkout_out="$(git checkout "$BASE" 2>&1)"; then
+ echo "Error: could not check out base '$BASE'. git reported:"
>&2
+ printf ' %s\n' "$checkout_out" >&2
+ echo "Resolve the working tree (commit or stash changes), then
re-run." >&2
+ rm -f "$PROMPT_WITH_CONTEXT"; break
+ fi
+ fi
+ BASE_HEAD="$(git rev-parse HEAD)"
+ fi
+
+ # Run one iteration with a fresh context.
+ # -p headless / non-interactive
+ # --dangerously-skip-permissions let the agent edit + validate
+ # unattended. Bypasses the AGENT
+ # permission layer, NOT the OS sandbox
+ # (see the SECURITY header above).
+ # --disallowedTools … defense-in-depth: hard-deny push and
+ # gh so a stray call cannot reach the
+ # remote even with permissions skipped.
+ "$AGENT" -p \
+ --dangerously-skip-permissions \
+ --disallowedTools "Bash(git push:*)" "Bash(gh:*)" \
+ --output-format=text \
+ --model "$MODEL" < "$PROMPT_WITH_CONTEXT" &
+ AGENT_PID=$!
+ spinner "$AGENT_PID" & SPINNER_PID=$!
+ wait "$AGENT_PID"
+ wait "$SPINNER_PID" 2>/dev/null
+ rm -f "$PROMPT_WITH_CONTEXT"
+
+ CUR_BRANCH="$(current_branch)"
+ LAST_COMMIT="$(git log --oneline -1 2>/dev/null)"
+ echo "[ branch ] $CUR_BRANCH"
+ [ -n "$LAST_COMMIT" ] && echo "[ commit ] $LAST_COMMIT"
+
+ if [ "$BUILD_ITERATION" = true ]; then
+ # Report the work-item branch the agent produced, by name, so you know
+ # exactly what to push.
+ if [ "$CUR_BRANCH" != "$BASE" ] && [ "$CUR_BRANCH" != "$TOOLING_REF"
]; then
+ # Branch-name collision check: a remote branch with the same name
+ # often means the agent just re-did work that already shipped under
+ # this slug (a merged PR's source branch typically lingers on the
+ # remote). Warn loudly — pushing would either be rejected or,
worse,
+ # overwrite the merged history. Check every configured remote, not
+ # just origin: a fork-based workflow has the lineage on `upstream`.
+ COLLISION_FOUND=false
+ for remote in $(git remote); do
+ if git ls-remote --heads --exit-code "$remote" "$CUR_BRANCH"
>/dev/null 2>&1; then
+ echo "⚠ Remote '$remote' already has a branch named
'$CUR_BRANCH'." >&2
+ COLLISION_FOUND=true
+ fi
+ done
+ if [ "$COLLISION_FOUND" = true ]; then
+ echo " Likely the source branch of a PR that already shipped
under this slug." >&2
+ echo " Inspect before pushing — do not push blind:" >&2
+ echo " git fetch --all && git log --oneline --all --
<changed-file>" >&2
+ fi
+ echo "[ new branch ] $CUR_BRANCH (forked off $BASE)"
+ echo " push it with: git push -u origin $CUR_BRANCH"
+ else
+ echo "⚠ No work-item branch was created (still on '$CUR_BRANCH').
Check the agent output above." >&2
+ fi
+
+ # Safety guard: a build/update iteration must never commit to the base.
+ if [ "$CUR_BRANCH" = "$BASE" ] && [ "$(git rev-parse HEAD)" !=
"$BASE_HEAD" ]; then
+ echo "✗ This iteration committed to '$BASE' instead of a work-item
branch." >&2
+ echo " Stopping so you can review (expected a new <slug>
branch)." >&2
+ break
+ fi
+
+ # Return to the control branch so the tooling (plan, prompts, specs) is
+ # present again and you are never stranded on the base. The work-item
+ # branch the agent created persists; this only moves HEAD.
+ if [ "$(current_branch)" != "$TOOLING_REF" ]; then
+ git checkout "$TOOLING_REF" >/dev/null 2>&1 || \
+ echo "⚠ Could not return to control branch '$TOOLING_REF' (now
on '$(current_branch)')." >&2
+ fi
+ fi
+ echo ""
+done
diff --git a/tools/spec-loop/specs/README.md b/tools/spec-loop/specs/README.md
new file mode 100644
index 0000000..ab811a7
--- /dev/null
+++ b/tools/spec-loop/specs/README.md
@@ -0,0 +1,74 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# Specs
+
+Each file here describes **one functional area of the product** — what it
+does, where it lives in the code, the contract it must honour, and its
+known gaps. They are modelled on the way a game's specs describe its
+battle, magic, and inventory systems: a faithful description of the
+intended behaviour that the build loop reconciles the code against.
+
+Filenames are **topics, not ordered identifiers** — there are no number
+prefixes, because numbering implies a priority the specs don't have.
+Priority lives in [`../IMPLEMENTATION_PLAN.md`](../IMPLEMENTATION_PLAN.md).
+
+## Separate from the RFCs
+
+These specs are **not** RFCs. The RFCs in
+[`../../../docs/rfcs/`](../../../docs/rfcs/) are the normative governance
+layer ("the constitution"); the loop **respects** them as constraints but
+never reads or edits them. Specs are the *functional description of the
+actual codebase*. A spec may cite a principle as a constraint; it never
+restates one and never lives in `docs/rfcs/`.
+
+## The specs
+
+Start with [`overview.md`](overview.md), then:
+
+- Modes: [`triage-mode.md`](triage-mode.md),
+ [`mentoring-mode.md`](mentoring-mode.md),
+ [`drafting-mode.md`](drafting-mode.md),
+ [`pairing-mode.md`](pairing-mode.md).
+- Cross-cutting: [`security-issue-lifecycle.md`](security-issue-lifecycle.md),
+ [`privacy-llm-gate.md`](privacy-llm-gate.md),
+ [`agent-isolation-sandbox.md`](agent-isolation-sandbox.md),
+ [`cve-tooling.md`](cve-tooling.md),
+ [`adoption-and-setup.md`](adoption-and-setup.md),
+ [`adapters.md`](adapters.md),
+ [`meta-and-quality-tooling.md`](meta-and-quality-tooling.md).
+
+(Auto-merge, the fifth MISSION mode, is deliberately off and has no
+spec — see the note in [`overview.md`](overview.md).)
+
+## Spec format
+
+Frontmatter:
+
+```yaml
+---
+title: <functional area>
+status: experimental # stable | experimental | proposed | off
+kind: feature # feature | fix | docs | chore
+mode: Triage # Triage | Mentoring | Drafting | Pairing | infra
+source: <MISSION.md clause / code paths this area is grounded in>
+acceptance:
+ - <verifiable criterion>
+---
+```
+
+Body sections: **What it does**, **Where it lives**, **Behaviour &
+contract**, **Out of scope**, **Acceptance criteria**, **Validation**,
+and (optional) **Known gaps** — the gaps are what the loop's plan pass
+turns into work items.
+
+Status mirrors [`docs/modes.md`](../../../docs/modes.md): `stable`,
+`experimental`, `proposed` (designed in MISSION, no code yet), `off`
+(deliberately not built).
+
+## Adding a spec
+
+1. Name the file for its topic (no number prefix).
+2. Fill in the frontmatter and the body sections above.
+3. Keep **Acceptance criteria** and **Validation** objective — they are
+ the loop's backpressure.
diff --git a/tools/spec-loop/specs/adapters.md
b/tools/spec-loop/specs/adapters.md
new file mode 100644
index 0000000..37d2b7e
--- /dev/null
+++ b/tools/spec-loop/specs/adapters.md
@@ -0,0 +1,77 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Adapters (Gmail / PonyMail / Jira / GitHub)
+status: experimental
+kind: feature
+mode: infra
+source: >
+ MISSION.md § Rationale ("ASF integrations live behind clean
+ configuration boundaries; non-ASF adopters swap them") and § Technical
+ scope (extensible adapter layer). Implemented in tools/gmail/,
+ tools/ponymail/, tools/jira/, tools/github/.
+acceptance:
+ - Project-specific integrations live behind adapter modules, not
+ hardcoded into skills.
+ - A non-ASF adopter can swap an adapter (e.g. private GitHub repo for
+ private mailing list) with config substitution, not skill edits.
+ - Mail-reading adapters route fetched content through the privacy
+ redactor before any LLM read.
+---
+
+# Adapters (Gmail / PonyMail / Jira / GitHub)
+
+## What it does
+
+Isolates the systems a project already uses behind adapter modules so the
+skills stay project-agnostic. The same skill executes against an ASF
+project's private mailing list or a non-ASF project's private GitHub repo
+by swapping the adapter, not the skill.
+
+## Where it lives
+
+- `tools/gmail/` — private-mail read/draft (drafts to the outbox; never
+ sends).
+- `tools/ponymail/` — public mailing-list archive search.
+- `tools/jira/` — issue-tracker adapter for projects on Jira.
+- `tools/github/` — issues/PRs/labels read + write-back helpers.
+
+## Behaviour & contract
+
+- **Pluggable, config-driven.** Skills reference placeholders
+ (`<tracker>`, `<upstream>`, `<security-list>`, …); the adapter resolves
+ them from `<project-config>/`. No `apache/<project>` strings hardcoded
+ into a skill.
+- **Mail adapters draft, never send** — outbound goes to the maintainer's
+ drafts folder; the human presses Send.
+- **Mail adapters redact-after-fetch** — fetched private content passes
+ through the privacy redactor
+ ([privacy-llm-gate.md](privacy-llm-gate.md)) before any LLM read.
+- **Write-back is confirm-before-apply** and routed through the sandbox's
+ `ask` gate ([agent-isolation-sandbox.md](agent-isolation-sandbox.md)).
+
+## Out of scope
+
+- The privacy *policy* and gate (separate area, referenced above).
+- Sending outbound mail (a human action).
+
+## Acceptance criteria
+
+1. No skill hardcodes a project-specific repo/list; all go through an
+ adapter + placeholder.
+2. Mail adapters draft only and redact before LLM read.
+3. Each adapter ships with its own tests.
+
+## Validation
+
+```bash
+for t in gmail ponymail jira github; do
+ uv run --project tools/$t --group dev pytest || echo "check tools/$t test
setup"
+done
+```
+
+## Known gaps
+
+- `experimental` overall — adapter coverage varies; a new adopter system
+ (e.g. GitLab, a different mail backend) is a gap the plan pass records.
diff --git a/tools/spec-loop/specs/adoption-and-setup.md
b/tools/spec-loop/specs/adoption-and-setup.md
new file mode 100644
index 0000000..7bb0510
--- /dev/null
+++ b/tools/spec-loop/specs/adoption-and-setup.md
@@ -0,0 +1,77 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Adoption & setup
+status: stable
+kind: feature
+mode: infra
+source: >
+ README.md § How adoption works / Adopting the framework / Maintenance.
+ Implemented by the setup family (setup-steward and siblings) and the
+ snapshot + agentic-override model.
+acceptance:
+ - An adopter commits exactly one skill (setup-steward); everything else
+ is a gitignored snapshot plus committed override + lock files.
+ - The committed lock pins install method + URL + ref so a fresh clone
+ re-installs the same framework version.
+ - Drift between the committed pin and the local install is detected and
+ surfaced with an upgrade proposal.
+---
+
+# Adoption & setup
+
+## What it does
+
+Gets the framework into an adopter repo and keeps it current using a
+**snapshot + agentic-override** model: one committed bootstrap skill, a
+gitignored framework snapshot (a build artefact, never committed),
+gitignored skill symlinks, and committed agent-readable override files.
+
+## Where it lives
+
+- Skill: `setup-steward` (adopt, verify, upgrade, override).
+- Skills: `setup-isolated-setup-install` / `-update` / `-verify`
+ (the sandbox harness), `setup-override-upstream` (promote a stabilised
+ override into a framework PR), `setup-shared-config-sync`.
+- Docs: `docs/setup/` (install recipes, agentic-overrides contract,
+ prerequisites).
+- Lock files: `.apache-steward.lock` (committed pin) and
+ `.apache-steward.local.lock` (gitignored, what this machine fetched).
+
+## Behaviour & contract
+
+- **One committed skill, no submodules, no vendored framework copies.**
+ The snapshot lives in a gitignored `.apache-steward/`.
+- **Committed lock is the source of truth.** A fresh contributor runs
+ `/setup-steward` and re-installs to the project's pinned version.
+- **Drift detection** at the top of every framework skill: if the
+ gitignored local lock has drifted from the committed pin, the skill
+ proposes `/setup-steward upgrade`.
+- **Overrides are agent-readable Markdown** under
+ `.apache-steward-overrides/`, consulted at runtime and merged before
+ default behaviour ([pairing/correctability is the model]).
+
+## Out of scope
+
+- The runtime behaviour of the modes themselves.
+- Editing the adopter's `.claude/settings.json` beyond what the install
+ recipe declares.
+
+## Acceptance criteria
+
+1. Adoption commits only the bootstrap skill + lock/override scaffold.
+2. The committed lock re-installs the same version on a fresh clone.
+3. Drift between local and committed locks is surfaced with an upgrade.
+
+## Validation
+
+```bash
+test -f docs/setup/README.md
+uv run --project tools/skill-validator --group dev skill-validate
+```
+
+## Known gaps
+
+- `stable`; gaps appear as new adopter skill-directory layouts to support
+ or new override surfaces — recorded by the plan pass.
diff --git a/tools/spec-loop/specs/agent-isolation-sandbox.md
b/tools/spec-loop/specs/agent-isolation-sandbox.md
new file mode 100644
index 0000000..c668053
--- /dev/null
+++ b/tools/spec-loop/specs/agent-isolation-sandbox.md
@@ -0,0 +1,84 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Agent isolation / layered sandbox
+status: stable
+kind: feature
+mode: infra
+source: >
+ MISSION.md § Privacy, security and supply-chain integrity ("Clean-
+ environment wrapper", "Layered sandbox by default", "Pinned, reviewed,
+ signed dependencies"). Implemented in tools/agent-isolation/, the
+ setup-isolated-setup-* skills, and .claude/settings.json.
+acceptance:
+ - Every agent subprocess runs inside an OS-level sandbox with default-
+ deny filesystem reads and network egress.
+ - Credential-shaped env vars are stripped before the agent execs.
+ - State-mutating shell calls (git push, gh pr create, …) require a
+ confirmation prompt; secrets/cred files are deny-read.
+---
+
+# Agent isolation / layered sandbox
+
+## What it does
+
+Runs every agent invocation inside a layered sandbox so that even a
+successful prompt injection cannot read credentials or reach a
+non-allowed host. The fallback when prompt engineering fails is the OS
+saying "no".
+
+## Where it lives
+
+- `tools/agent-isolation/` — the harness (clean-env wrapper +
+ sandbox profiles).
+- `.claude/settings.json` — the `sandbox` block (filesystem
+ allow/deny, network `allowedDomains`) and `permissions` (`deny` /
+ `ask`).
+- Skills: `setup-isolated-setup-install`, `-update`, `-verify`.
+- `docs/setup/secure-agent-internals.md` — the three-layer model.
+
+## Behaviour & contract
+
+The reference model is four layers, layered:
+
+1. **Clean environment** — a wrapper strips the process env to a
+ project-declared whitelist before exec (no `$GH_TOKEN`, `$AWS_*`,
+ `$ANTHROPIC_API_KEY` leakage).
+2. **Filesystem + network sandbox** — Linux `bubblewrap` + `socat` SNI
+ proxy; macOS `sandbox-exec`. Default-deny reads outside the tree and
+ egress to non-allowed hosts.
+3. **Tool permissions** — the host's `permissions.deny` blocks denied
+ paths/binaries (`Read(~/.ssh/**)`, `Bash(curl *)`, …).
+4. **Forced confirmation** — `permissions.ask` on every state-mutating
+ shell call (`git push`, `gh pr create`, `gh issue edit`, …).
+
+Pinned system tools (`bubblewrap`, `socat`, agent CLI) are aged through a
+cooldown window; bumps are PRs, not silent updates.
+
+## Out of scope
+
+- The human-in-the-loop *confirmation* itself (that is the modes' job);
+ this area provides the OS-level enforcement underneath it.
+- Editing `.claude/settings.json` (it is in the `deny` list).
+
+## Acceptance criteria
+
+1. Filesystem and network default-deny with explicit allow-lists.
+2. The clean-env wrapper strips credential-shaped vars before exec.
+3. `git push` / `gh pr create` are in `permissions.ask`; secret/cred
+ files are in `permissions.deny`.
+
+## Validation
+
+```bash
+uv run --project tools/agent-isolation --group dev pytest
+python3 -c "import json,sys; s=json.load(open('.claude/settings.json')); \
+ asks=' '.join(s['permissions']['ask']); \
+ sys.exit(0 if 'git push' in asks and 'gh pr create *' in asks else 1)"
+```
+
+## Known gaps
+
+- `stable`; drift shows up when a new state-mutating command is added to a
+ skill without a matching `ask` rule — the plan pass flags it.
diff --git a/tools/spec-loop/specs/cve-tooling.md
b/tools/spec-loop/specs/cve-tooling.md
new file mode 100644
index 0000000..0f0af33
--- /dev/null
+++ b/tools/spec-loop/specs/cve-tooling.md
@@ -0,0 +1,74 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: CVE tooling
+status: stable
+kind: feature
+mode: infra
+source: >
+ README.md § Skill families (security) and AGENTS.md § Linking CVEs.
+ Implemented in tools/vulnogram/generate-cve-json/, tools/cve-org/, and
+ the security-cve-allocate skill.
+acceptance:
+ - A deterministic command produces a paste-ready CVE 5.x JSON record
+ from a tracking issue's template fields.
+ - Tracker-repo URLs and the CVE-tool OAuth URL are filtered out of the
+ record's references before serialisation.
+ - The skill regenerates the attached JSON so it stays in lock-step with
+ the issue body.
+---
+
+# CVE tooling
+
+## What it does
+
+Turns a tracking issue into a CVE allocation and a paste-ready CVE 5.x
+JSON record — the machine half of the security lifecycle's allocation
+step. Deterministic (no model calls) so the output is reproducible and
+reviewable.
+
+## Where it lives
+
+- `tools/vulnogram/generate-cve-json/` — a `uv run` script that parses an
+ issue's template fields (multiple credits, multiple reference URLs,
+ `>= X, < Y` version ranges) and emits `containers.cna` JSON matching
+ Vulnogram's export shape, plus the Vulnogram `#json` paste URL.
+- `tools/cve-org/` — CVE.org / CVE-services helpers.
+- Skill: `security-cve-allocate` — walks the (PMC-gated) allocation form,
+ then updates the tracker and regenerates the attached JSON via
+ `generate-cve-json --attach`.
+
+## Behaviour & contract
+
+- **Deterministic and reviewable.** The JSON is generated by a Python
+ script, not an LLM; the same issue yields the same record.
+- **Reference hygiene.** The project's CVE-tool URL
+ (`cveprocess.apache.org/...`) and any tracker-repo URLs are filtered
+ out of `references[]` before serialising — they are OAuth-gated / private.
+- **Allocation is PMC-gated.** The skill submits nothing itself when the
+ user is not on the PMC; it reshapes the recipe into a relay message.
+- **Lock-step.** The attached JSON is regenerated whenever the tracker
+ body changes, so the two never drift.
+
+## Out of scope
+
+- Deciding whether a report warrants a CVE (that is triage/assessment).
+- Publishing the CVE (a PMC action outside the framework).
+
+## Acceptance criteria
+
+1. `generate-cve-json` emits valid CVE 5.x JSON from a tracking issue.
+2. Tracker + CVE-tool URLs are absent from `references[]`.
+3. `--attach` refreshes the JSON on the tracking issue.
+
+## Validation
+
+```bash
+uv run --project tools/vulnogram/generate-cve-json --group dev pytest
+```
+
+## Known gaps
+
+- `stable`; drift appears if the CVE 5.x schema or Vulnogram export shape
+ changes upstream — caught by the tool's own tests.
diff --git a/tools/spec-loop/specs/drafting-mode.md
b/tools/spec-loop/specs/drafting-mode.md
new file mode 100644
index 0000000..8c18389
--- /dev/null
+++ b/tools/spec-loop/specs/drafting-mode.md
@@ -0,0 +1,73 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Drafting mode
+status: experimental
+kind: feature
+mode: Drafting
+source: >
+ MISSION.md § Technical scope (Drafting). docs/modes.md § Drafting.
+ Implemented by security-issue-fix (stable, security-only) and
+ issue-fix-workflow (experimental).
+acceptance:
+ - A drafting skill produces the failing test, the smallest production
+ change, targeted test runs, and a commit — but never merges.
+ - The PR is opened via `gh pr create --web` (human reviews in browser)
+ or handed back for the human to push; no autopilot push/merge.
+ - Security-class drafts scrub CVE / tracker-slug / "security fix" /
+ "vulnerability" from every public surface until the embargo lifts.
+---
+
+# Drafting mode
+
+## What it does
+
+The agent drafts a fix for a well-scoped problem — a triaged issue, a
+CVE-allocated security report with team consensus on scope, a failing
+test with an obvious cause, a documentation hole — and prepares a PR.
+Every PR is reviewed and merged by a human committer; the agent never
+merges its own work.
+
+## Where it lives
+
+- `security-issue-fix` (stable, security-only) — drafts the fix in the
+ user's local `<upstream>` clone, runs local checks, opens the public
+ PR via `gh pr create --web`, scrubs confidential framing.
+- `issue-fix-workflow` (experimental) — drafts a fix for a triaged
+ general-issue; **does not** open the PR on autopilot, hands back a
+ branch + commits + test results for the human to push.
+- `tools/dev` — shared local-check helpers.
+
+## Behaviour & contract
+
+- **Draft, never merge.** No skill in this mode merges. Opening the PR is
+ `gh pr create --web` (human confirms in the browser) or a hand-back.
+- **Confidentiality scrub** (security): commit message, branch name, PR
+ title/body, newsfragment are scrubbed for CVE IDs, the tracker repo
+ slug, and the words "security fix" / "vulnerability" before any write
+ or push (see `AGENTS.md` § Confidentiality).
+- **Commit trailer** `Generated-by:` — never `Co-Authored-By:` an agent.
+
+## Out of scope
+
+- Generic Drafting beyond security + general-issue (lint fixes, audit-
+ tool findings, doc holes at scale) — `proposed`, not yet built.
+- Merging, releasing, or pushing without a human.
+
+## Acceptance criteria
+
+1. No drafting skill merges or force-pushes.
+2. Security drafts pass the confidentiality scrub before any public write.
+3. `skill-validate` passes on the drafting-family skills.
+
+## Validation
+
+```bash
+uv run --project tools/skill-validator --group dev skill-validate
+```
+
+## Known gaps
+
+- Generic (non-security, non-issue) Drafting from audit-tool findings is
+ `proposed`. Only `security-issue-fix` is stable today.
diff --git a/tools/spec-loop/specs/mentoring-mode.md
b/tools/spec-loop/specs/mentoring-mode.md
new file mode 100644
index 0000000..abc6ada
--- /dev/null
+++ b/tools/spec-loop/specs/mentoring-mode.md
@@ -0,0 +1,75 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Mentoring mode
+status: proposed
+kind: feature
+mode: Mentoring
+source: >
+ MISSION.md § Technical scope (Mentoring) — "the highest-value
+ project-side mode and the one off-the-shelf agent tooling skips".
+ docs/modes.md § Mentoring (proposed, 0 skills). Spec exists at
+ docs/mentoring/spec.md ahead of any skill code.
+acceptance:
+ - The Mentoring spec (tone guide, hand-off protocol, adopter knobs) is
+ reviewable independently of any runtime skill (it already is).
+ - The first skill ships flagged mode Mentoring + experimental and joins
+ threads in a teaching register, never gatekeeps.
+ - Hand-off to a human is explicit when scope exceeds the agent.
+---
+
+# Mentoring mode
+
+## What it does
+
+Joins issue and PR threads in a deliberately teaching register:
+clarifying questions, pointers to project conventions and docs, an
+explanation of *why* a change is being asked for, paired examples from
+similar prior PRs, and a clean hand-off to a human reviewer when the
+question exceeds what an agent should answer. MISSION names this the
+contributor-empowerment lever the wider ecosystem most needs.
+
+## Where it lives
+
+- Spec ahead of code: `docs/mentoring/README.md`,
+ `docs/mentoring/spec.md`.
+- Adopter config scaffold: `projects/_template/mentoring-config.md`.
+- First skill (planned): `pr-management-mentor` (working name), ships
+ `mode: Mentoring` + `experimental`.
+
+## Behaviour & contract
+
+- **Teaching register, never gatekeeping.** The most sensitive surface
+ in the project (MISSION § Particular care): a condescending agent that
+ drives a contributor away is not patchable. Tone is the project's to
+ set (`mentoring-config.md`).
+- Read-only / drafts replies for human review; never closes or rejects a
+ contributor's work on its own.
+- Explicit hand-off protocol when the question is out of the agent's
+ depth.
+
+## Out of scope
+
+- Implementation-detail review that belongs to Pairing
+ ([Pairing](pairing-mode.md)).
+- Any contributor-facing message sent without human review.
+
+## Acceptance criteria
+
+1. The Mentoring spec is reviewable without any skill code (it is).
+2. The first Mentoring skill validates and carries `mode: Mentoring`.
+3. Hand-off-to-human is documented and enforced.
+
+## Validation
+
+```bash
+test -f docs/mentoring/spec.md
+uv run --project tools/skill-validator --group dev skill-validate
+```
+
+## Known gaps
+
+- **No skill yet** — this is a pure `proposed` gap. The spec and tone
+ guide exist; the prototype skill is the first work item the loop would
+ pick up under this area.
diff --git a/tools/spec-loop/specs/meta-and-quality-tooling.md
b/tools/spec-loop/specs/meta-and-quality-tooling.md
new file mode 100644
index 0000000..f12f9ce
--- /dev/null
+++ b/tools/spec-loop/specs/meta-and-quality-tooling.md
@@ -0,0 +1,81 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Meta & quality tooling
+status: stable
+kind: feature
+mode: infra
+source: >
+ README.md § Skill families (utilities) and AGENTS.md § Reusable skills.
+ Implemented by tools/skill-validator/, tools/skill-evals/,
+ tools/sandbox-lint/, tools/dashboard-generator/, tools/probe-templates/,
+ and the write-skill / list-steward-skills utility skills.
+acceptance:
+ - Skill definitions are validated (frontmatter keys name/description/
+ license, internal link integrity, placeholder conventions).
+ - There is a live, generated index of available skills (no cached copy).
+ - Skill behaviour can be measured by an eval harness.
+ - Every skill ships a behavioural eval suite under
+ tools/skill-evals/evals/<skill-name>/ (per /AGENTS.md § Reusable
+ skills).
+---
+
+# Meta & quality tooling
+
+## What it does
+
+The framework's tooling for authoring, validating, indexing, and
+evaluating its own skills — the quality gate that keeps the catalogue
+trustworthy as it grows.
+
+## Where it lives
+
+- `tools/skill-validator/` — validates `SKILL.md` frontmatter (required
+ `name`, `description`, `license`), internal link integrity, and
+ placeholder conventions. CLI: `skill-validate`.
+- `tools/skill-evals/` — harness for measuring skill behaviour.
+- `tools/sandbox-lint/` — lints the sandbox/permissions configuration.
+- `tools/dashboard-generator/` — read-only HTML dashboards over campaign
+ artefacts.
+- `tools/probe-templates/` — reusable probes.
+- Skills: `write-skill` (author/update a skill), `list-steward-skills`
+ (live, generated index of every skill, grouped by family).
+
+## Behaviour & contract
+
+- **Generated, never cached.** `list-steward-skills` reads the live
+ `.claude/skills/*/SKILL.md` frontmatter on every run, so the index never
+ goes stale.
+- **Deterministic checks.** `skill-validator` and `sandbox-lint` are
+ heuristic/text tools with no model calls — reproducible in CI.
+- **Hard vs soft rules.** The validator fails on missing frontmatter or
+ broken links; advisories are warnings unless `--strict`.
+
+## Out of scope
+
+- The maintainership modes themselves.
+- The spec-loop tooling, which lives in `tools/spec-loop/` and is
+ documented in [`../README.md`](../README.md), not here.
+
+## Acceptance criteria
+
+1. `skill-validate` enforces required frontmatter + link integrity.
+2. `list-steward-skills` generates its index from live frontmatter.
+3. Each meta tool ships with its own tests.
+
+## Validation
+
+```bash
+uv run --project tools/skill-validator --group dev pytest
+uv run --project tools/skill-validator --group dev skill-validate
+```
+
+## Known gaps
+
+- **Eval coverage is incomplete** — the harness has ~15 suites but the
+ repo has more skills than that; skills added before the per-skill-eval
+ convention have no suite. Back-filling one suite per uncovered skill is
+ a tracked work item.
+- Other gaps appear as new quality checks worth adding (e.g. a spec
+ validator analogous to the skill validator) — recorded by the plan pass.
diff --git a/tools/spec-loop/specs/overview.md
b/tools/spec-loop/specs/overview.md
new file mode 100644
index 0000000..0022df8
--- /dev/null
+++ b/tools/spec-loop/specs/overview.md
@@ -0,0 +1,68 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# Product overview
+
+Apache Steward (working name; rename in flight) is a **project-agnostic
+framework for agent-assisted repository maintainership and development**.
+It is not an application with a `src/` tree — it is a substrate plus a
+catalogue of agent-readable **skills** (Markdown + YAML) and deterministic
+**tools** (`uv`/Python). The specs in this directory describe that
+functionality so the build loop can keep the code in sync with the
+intended behaviour.
+
+These specs are **unordered** — the filename is a topic, not a priority.
+What to build next comes from
[`../IMPLEMENTATION_PLAN.md`](../IMPLEMENTATION_PLAN.md),
+not from any numbering.
+
+## Substrate
+
+Issue/PR ingestion, GitHub write-back, mail threading, audit logging, and
+pluggable adapters for the systems a project already uses (Gmail,
+PonyMail, Jira, GitHub, Vulnogram). Per-project configuration declares
+which modes are on, eligible change classes, reviewers, and where reports
+come from.
+
+## The modes (MISSION taxonomy)
+
+Each mode is an independently toggleable set of skills. Maturity mirrors
+[`docs/modes.md`](../../../docs/modes.md):
+
+| Mode | Spec | Maturity |
+|---|---|---|
+| Triage | [triage-mode.md](triage-mode.md) | stable (security) / experimental
(PR, issue) |
+| Mentoring | [mentoring-mode.md](mentoring-mode.md) | proposed (0 skills) |
+| Drafting | [drafting-mode.md](drafting-mode.md) | stable (security) /
experimental (issue) |
+| Pairing | [pairing-mode.md](pairing-mode.md) | proposed (0 skills) |
+
+> **Auto-merge** is the fifth MISSION mode but is deliberately **off** by
+> sequencing policy (`.asf.yaml` `allow_auto_merge: false`) — it has no
+> implementation and nothing to build, so it is documented as a boundary
+> in [`docs/modes.md`](../../../docs/modes.md), not as a spec here.
+
+## Cross-cutting functionality
+
+| Area | Spec |
+|---|---|
+| Security-issue lifecycle (the load-bearing use case) |
[security-issue-lifecycle.md](security-issue-lifecycle.md) |
+| Privacy-LLM gate + PII redaction |
[privacy-llm-gate.md](privacy-llm-gate.md) |
+| Agent isolation / layered sandbox |
[agent-isolation-sandbox.md](agent-isolation-sandbox.md) |
+| CVE tooling | [cve-tooling.md](cve-tooling.md) |
+| Adoption & setup | [adoption-and-setup.md](adoption-and-setup.md) |
+| Adapters (Gmail / PonyMail / Jira / GitHub) | [adapters.md](adapters.md) |
+| Meta & quality tooling |
[meta-and-quality-tooling.md](meta-and-quality-tooling.md) |
+
+## The non-negotiables every area inherits
+
+- **Human-in-the-loop on every state change.** Outputs are proposals;
+ the human confirms. (`docs/rfcs/` holds the normative statement; this
+ spec system respects it as a constraint and never edits it.)
+- **Drafts, never sends; the human presses the button** — no skill calls
+ `git push` / `gh pr create` on autopilot.
+- **Secure sandbox by default**, vendor neutrality, privacy by design.
+
+## How these specs are built
+
+The build loop (see [`../README.md`](../README.md)) compares each spec
+against the code and turns the gaps into work items in the plan — one
+work item, one branch, one PR.
diff --git a/tools/spec-loop/specs/pairing-mode.md
b/tools/spec-loop/specs/pairing-mode.md
new file mode 100644
index 0000000..2594d23
--- /dev/null
+++ b/tools/spec-loop/specs/pairing-mode.md
@@ -0,0 +1,72 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Pairing mode
+status: proposed
+kind: feature
+mode: Pairing
+source: >
+ MISSION.md § Technical scope (Pairing) and § Initial Goals ("Ship at
+ least one Pairing skill family in v1"). docs/modes.md § Pairing
+ (proposed, 0 skills).
+acceptance:
+ - At least one Pairing skill exists and validates (v1 goal).
+ - Pairing skills run in the developer's OWN dev loop and make no state
+ change on behalf of the project (read-only / hand-back).
+ - Mentorship is intrinsic: the agent handles implementation-detail
+ review so the human conversation stays on design and reasoning.
+---
+
+# Pairing mode
+
+## What it does
+
+The developer-side counterpart to the project-side modes. Pairing skills
+run in a maintainer's or contributor's *own* dev loop: multi-agent review
+pipelines, self-review and pre-flight patterns, and scoped fix drafting
+under the developer's driver's seat. Mentorship is intrinsic — the agent
+absorbs mechanical implementation-detail review so the human-to-human
+conversation stays on design and the trade-offs the project cares about,
+protecting the ASF contribution path (contributor → committer → PMC).
+
+## Where it lives
+
+- Currently nowhere — the family is empty (`docs/modes.md`: 0 skills).
+- Planned skills: a pre-flight **self-review** skill (highest priority)
+ and a **multi-agent review** pipeline — both tracked as work items in
+ [`../IMPLEMENTATION_PLAN.md`](../IMPLEMENTATION_PLAN.md).
+
+## Behaviour & contract
+
+- **No state change on the project's behalf.** Pairing skills are the
+ developer's toolkit; they end at a report or a local branch.
+- Same skill format and sandbox/privacy posture as the project-side modes.
+- **Ships before Auto-merge** in the roadmap (MISSION sequencing): Pairing
+ must establish that human reasoning, not implementation chatter, is the
+ load-bearing part of the workflow before any auto-merge is considered.
+
+## Out of scope
+
+- Acting on issues/PRs/threads on the project's behalf (that is
+ Triage/Mentoring/Drafting).
+- Auto-merge — deliberately off by MISSION sequencing; not built.
+
+## Acceptance criteria
+
+1. ≥1 Pairing skill exists, validates, and is read-only/hand-back.
+2. `docs/modes.md` Pairing row reflects the shipped count and status.
+
+## Validation
+
+```bash
+ls .claude/skills/ | grep -q '^pairing-' && echo "pairing skill present" ||
echo "GAP: no pairing skill"
+uv run --project tools/skill-validator --group dev skill-validate
+```
+
+## Known gaps
+
+- **Empty family** — the largest functional gap against MISSION's v1
+ goals. Work items (in
[`../IMPLEMENTATION_PLAN.md`](../IMPLEMENTATION_PLAN.md)):
+ a pre-flight self-review skill (first, highest priority), then a
+ multi-agent review pipeline.
diff --git a/tools/spec-loop/specs/privacy-llm-gate.md
b/tools/spec-loop/specs/privacy-llm-gate.md
new file mode 100644
index 0000000..35a9436
--- /dev/null
+++ b/tools/spec-loop/specs/privacy-llm-gate.md
@@ -0,0 +1,84 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Privacy-LLM gate + PII redaction
+status: stable
+kind: feature
+mode: infra
+source: >
+ MISSION.md § Privacy, security and supply-chain integrity
+ ("Privacy-aware LLM routing", "PII redaction at the boundary").
+ Implemented in tools/privacy-llm/ (checker, redactor, wiring.md, pii.md)
+ and documented in docs/setup/privacy-llm.md.
+acceptance:
+ - Private content reaches only LLMs the project's PMC has approved; the
+ gate refuses to route private bytes through a non-approved model.
+ - PII is redacted to stable hashed identifiers before any LLM read; the
+ reverse map stays on the maintainer's local disk (mode 0600).
+ - Every public artefact is scrubbed for private-content leakage before
+ it leaves the framework's control.
+---
+
+# Privacy-LLM gate + PII redaction
+
+## What it does
+
+Treats private content — security reports, embargoed CVE detail,
+PMC-private mail, contributor PII — as chain-of-custody material. Three
+mechanisms: an **approved-LLM gate** that verifies the model-of-the-moment
+is on the PMC's allow-list before any private read; a **redactor** that
+maps reporter names/emails/IPs to stable hashed identifiers before the
+LLM sees them; and a **confidentiality scrub** that checks every public
+artefact for leakage before emission.
+
+## Where it lives
+
+- `tools/privacy-llm/checker/` — the approved-LLM gate.
+- `tools/privacy-llm/redactor/` — the PII redactor (name→`N-<hash>`,
+ email→`E-<hash>`, IP→`IP-<hash>`).
+- `tools/privacy-llm/wiring.md` — the redact-after-fetch protocol every
+ Gmail/PonyMail-reading skill follows.
+- `tools/privacy-llm/pii.md` — the PII pattern catalogue.
+- `docs/setup/privacy-llm.md` — adopter-facing setup.
+
+## Behaviour & contract
+
+- **Policy is the PMC's; the gate is the framework's.** The approved-LLM
+ list is per-PMC. The default: agent host trusted, `*.apache.org`
+ auto-approved, `localhost` for local inference, everything else opt-in.
+- **Redact before read; reveal locally.** Skills operate on hashed
+ identifiers; the reverse map never goes to an LLM and is never committed.
+- **Reporter credit is preserved** (CVE `credits[]`) only after the
+ reporter confirms on the inbound thread — credit is a deliberate output,
+ not in-context PII.
+- **Confidentiality scrub before public emission** — regex for CVE IDs in
+ pre-disclosure PRs, reporter names from the local map, list addresses,
+ and any project-declared private string; failures stop the flow.
+- **Audit log is privacy-aware** — references hashed identifiers, never
+ raw PII.
+
+## Out of scope
+
+- Setting the approved-LLM *policy* (that is the PMC's, per adopter).
+- Enforcing OS-level isolation — that is the sandbox
+ ([agent-isolation-sandbox.md](agent-isolation-sandbox.md)).
+
+## Acceptance criteria
+
+1. The gate blocks a private read when the active model is not approved.
+2. The redactor produces stable hashed identifiers and keeps the reverse
+ map local (0600, gitignored).
+3. The scrub catches CVE IDs / reporter names / list addresses before any
+ public write.
+
+## Validation
+
+```bash
+uv run --project tools/privacy-llm --group dev pytest
+```
+
+## Known gaps
+
+- `stable`; gaps surface as new PII patterns or new public-emission
+ surfaces not yet covered by the scrub — caught as drift by the plan pass.
diff --git a/tools/spec-loop/specs/security-issue-lifecycle.md
b/tools/spec-loop/specs/security-issue-lifecycle.md
new file mode 100644
index 0000000..ee13186
--- /dev/null
+++ b/tools/spec-loop/specs/security-issue-lifecycle.md
@@ -0,0 +1,76 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Security-issue lifecycle (end-to-end)
+status: stable
+kind: feature
+mode: Triage
+source: >
+ MISSION.md § Rationale ("Security-issue handling is a load-bearing use
+ case"). README.md § Skill families (security). The security skill
+ family + tools/vulnogram + tools/gmail + tools/ponymail +
+ tools/privacy-llm.
+acceptance:
+ - The flow runs import → triage → dedupe → CVE allocate → fix → sync →
+ invalidate, each step a confirm-before-apply skill.
+ - Tracker contents stay private; only stable identifiers (URLs, #NNN)
+ are public-safe. Public PRs are scrubbed pre-disclosure.
+ - Every applied state change is audit-logged.
+---
+
+# Security-issue lifecycle
+
+## What it does
+
+The framework's flagship, highest-procedure flow: handle an inbound ASF
+security report end-to-end, from `security@` import through CVE
+publication, with a human gate and an audit-log entry at every step.
+
+## Where it lives
+
+- Skills: `security-issue-import` (+ `-from-pr`, `-from-md`),
+ `security-issue-triage`, `security-issue-deduplicate`,
+ `security-cve-allocate`, `security-issue-fix`, `security-issue-sync`,
+ `security-issue-invalidate`.
+- Tools: `tools/vulnogram/generate-cve-json` (CVE 5.x JSON),
+ `tools/cve-org`, `tools/gmail` + `tools/ponymail` (mail), and the
+ `tools/privacy-llm` gate/redactor ([the privacy gate](privacy-llm-gate.md)).
+
+## Behaviour & contract
+
+- **Confirm-before-apply at every step.** Imports create trackers only
+ after confirmation; triage posts proposal comments and never decides;
+ allocation is PMC-gated; fixes draft PRs the human opens.
+- **Confidentiality (three layers, see `AGENTS.md`):** tracker URLs and
+ `#NNN` are public-safe identifiers; tracker *contents* are private;
+ the security framing of a public PR is embargoed until the advisory
+ ships.
+- **Reporter PII redacted in-context; reporter *credit* preserved** in
+ the CVE `credits[]` only after the reporter confirms on the thread.
+- **Audit log** of every applied change (redacted identifiers only).
+
+## Out of scope
+
+- Drafting beyond the security case (see [Drafting](drafting-mode.md)).
+- Sending mail — replies are drafted to the maintainer's outbox.
+
+## Acceptance criteria
+
+1. Each lifecycle skill is confirm-before-apply and audit-logs applied
+ changes.
+2. No public surface produced by the flow contains tracker contents or
+ pre-disclosure security framing.
+3. CVE JSON is regenerated to stay in lock-step with the tracker body.
+
+## Validation
+
+```bash
+uv run --project tools/skill-validator --group dev skill-validate
+uv run --project tools/vulnogram/generate-cve-json --group dev pytest
+```
+
+## Known gaps
+
+- The flow is `stable`; gaps surface as drift between a skill's documented
+ steps and the adapters it calls — the loop's plan pass catches those.
diff --git a/tools/spec-loop/specs/triage-mode.md
b/tools/spec-loop/specs/triage-mode.md
new file mode 100644
index 0000000..a264e0a
--- /dev/null
+++ b/tools/spec-loop/specs/triage-mode.md
@@ -0,0 +1,74 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Triage mode
+status: experimental
+kind: feature
+mode: Triage
+source: >
+ MISSION.md § Technical scope (Triage). docs/modes.md § Triage.
+ Implemented by the pr-management, issue, and security skill families.
+acceptance:
+ - Every triage skill is read-only on tracker state or proposes-then-
+ confirms; none transitions, closes, or labels without confirmation.
+ - Classifications are grounded in prior triaged cases / the project's
+ Security Model, not invented categories.
+ - Security-side import/dedupe/sync/invalidate/allocate skills are
+ stable; PR and general-issue triage are experimental.
+---
+
+# Triage mode
+
+## What it does
+
+The lowest-risk, foundational mode: spot inbound issues / security
+reports / PRs, classify them, surface likely duplicates, link related
+discussions, and propose routing to the right human. Every output is a
+suggestion the human signs off on.
+
+## Where it lives
+
+- PR queue: `pr-management-triage`, `pr-management-stats`,
+ `pr-management-code-review` (deep review is a triage variant).
+- General issues: `issue-triage`, `issue-reassess`, `issue-reproducer`.
+- Security inbound: `security-issue-import`, `-import-from-pr`,
+ `-import-from-md`, `security-issue-deduplicate`,
+ `security-issue-invalidate`, `security-issue-sync`,
+ `security-cve-allocate`.
+- Adapters it reads through: `tools/github`, `tools/jira`,
+ `tools/ponymail`, `tools/gmail`.
+
+## Behaviour & contract
+
+- **Read-only or propose-then-confirm.** `issue-triage` and
+ `security-issue-triage` post a *proposal comment* on confirmation and
+ never flip labels, close, or allocate. Reproducers produce evidence
+ (`verdict.json`), never post.
+- Six-class disposition vocabulary on the security side
+ (`VALID` / `DEFENSE-IN-DEPTH` / `INFO-ONLY` / `INVALID` /
+ `PROBABLE-DUP` / `FIX-ALREADY-PUBLIC`).
+- Duplicate detection keys on stable identifiers (Gmail `threadId`,
+ GHSA-ID), not on fuzzy body text alone.
+
+## Out of scope
+
+- Authoring fixes (that is Drafting, [Drafting](drafting-mode.md)).
+- Any state change a human has not confirmed in-session.
+
+## Acceptance criteria
+
+1. No triage skill performs an unconfirmed state change.
+2. `skill-validate` passes on all triage-family skills.
+3. docs/modes.md Triage table matches the shipped skill set.
+
+## Validation
+
+```bash
+uv run --project tools/skill-validator --group dev skill-validate
+```
+
+## Known gaps
+
+- PR and general-issue triage are `experimental` — no adopter-pilot eval
+ has run; behaviour may change.