(airflow-steward) branch main updated: feat(spec-loop): freshness check + branch-name collision check (#297)

potiuk Tue, 26 May 2026 14:57:33 -0700

This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git



The following commit(s) were added to refs/heads/main by this push:
     new 1ced54f   feat(spec-loop): freshness check + branch-name collision 
check (#297)
1ced54f is described below

commit 1ced54f009a324db84c53269e5bb99daef0a4716
Author: Justin Mclean <[email protected]>
AuthorDate: Wed May 27 07:55:55 2026 +1000

     feat(spec-loop): freshness check + branch-name collision check (#297)
    
    * feat(spec-loop): add spec-driven build loop and product specs
    
    Introduce a spec-driven development loop under tools/spec-loop/ that
    develops and maintains the framework against a written description of
    what it does.
    
    - specs/ describes the actual functionality (the four live modes, the
      security lifecycle, privacy gate, sandbox, CVE tooling, adoption,
      adapters, meta tooling) as topic-named files, separate from the RFCs.
    - IMPLEMENTATION_PLAN.md holds the prioritised gaps as work items.
    - loop.sh runs plan / build / update / consolidate. Build implements one
      work item per branch (one work item = one branch = one PR) and never
      pushes or opens a PR; the human does that. update back-fills specs from
      functionality contributed outside the loop.
    - docs/spec-driven-development.md explains how it works.
    
    Generated-by: Claude (Opus 4.7)
    
    * docs(skills): require an eval suite for every skill
    
    Add to AGENTS.md § Reusable skills the convention that every skill ships
    a behavioural eval suite under tools/skill-evals/evals/<skill-name>/
    covering each pipeline step with fixture cases — a skill PR without a
    matching eval is incomplete. Mirror the rule into the spec-loop build
    prompt and operational context so the loop enforces it, and record the
    "back-fill missing eval suites" gap in the plan.
    
    Generated-by: Claude (Opus 4.7)
    
    * fix(spec-loop): default loop base to the current branch, not spec-driven
    
    SPEC_LOOP_BASE defaulted to spec-driven — the throwaway scaffolding
    branch — which is a dangling reference the moment this PR merges. Default
    it instead to the branch the loop is started on (fall back to main if
    detached), and point the work-item PR examples at main.
    
    Generated-by: Claude (Opus 4.7)
    
    * fix(spec-loop): satisfy markdownlint and doctoc
    
    - Tag the push/PR command fences in PROMPT_build.md and PROMPT_update.md
      with `text` so markdownlint MD040 (fenced-code-language) passes.
    - Exclude tools/spec-loop/ from the doctoc hook: the specs carry YAML
      frontmatter and the prompts are short single-purpose docs — same
      rationale as the existing skill and skill-evals exclusions.
    
    Generated-by: Claude (Opus 4.7)
    
    * docs(spec-loop): explain --dangerously-skip-permissions in the security 
model
    
    The loop runs the agent headless with --dangerously-skip-permissions.
    Document how that fits the layered sandbox: the flag bypasses the agent
    permission layer (.claude/settings.json deny/ask) but NOT the OS sandbox
    (clean-env + filesystem/network), which stays the real boundary — matching
    the flag's own "sandboxes only" guidance.
    
    - Add a "Security and the dangerously-skip-permissions flag" section to
      docs/spec-driven-development.md (what each sandbox layer does, why the
      loop stays safe: run-in-sandbox, no credentials, structural containment).
    - Add a Security callout to tools/spec-loop/README.md and a SECURITY
      header block to loop.sh.
    - Harden the invocation: --disallowedTools "Bash(git push *)" "Bash(gh *)"
      as defence in depth so a stray push/PR cannot reach the remote.
    
    Generated-by: Claude (Opus 4.7)
    
    * fix(spec-loop): make plan consolidation hysteretic and not livelock
    
    The build loop switched to a consolidation round whenever the plan
    exceeded 500 lines, but the consolidate beat must preserve every planned
    work item. A plan that was long because of pending work (not stale
    history) could therefore never drop below the threshold, so every
    subsequent build iteration re-consolidated forever and never built.
    
    - Consolidate at most once (CONSOLIDATE_TRIED latch); if still over after
      one round, build anyway and note that the length is planned work. The
      latch resets when the plan drops back under the limit.
    - Give PROMPT_consolidate a real target (~300 lines, below the 500
      trigger) for hysteresis, while preserving every planned work item.
    - Make the threshold configurable via SPEC_LOOP_PLAN_MAX (default 500)
      and document it.
    
    Generated-by: Claude (Opus 4.7)
    
    * loop improvements
    
    * so we can run this before merging
    
    * use git checkout not git swicth so it works with slightly older versions 
of git
    
    * read tooling from control branch, build work items on main
    
    * remove "spec/" on branches
    
    * fix(spec-loop): correct update-mode note, arg validation, spinner, deny 
syntax
    
    Address self-review findings on loop.sh:
    
    - update mode no longer receives the build-only "do NOT edit specs" note
      when launched from a branch != BASE; it is now told to author the updated
      specs on the work branch (update is the one beat that writes specs).
    - reject a non-numeric iteration count instead of erroring to stderr and
      silently treating it as 0 (i.e. running unbounded).
    - spinner indexes the braille frames as an array rather than by byte offset,
      so it renders correctly under a C/POSIX locale (the clean-env sandbox 
case).
    - correct --disallowedTools to the colon wildcard form
      (Bash(git push:*) / Bash(gh:*)) so the defense-in-depth deny actually
      matches git push / gh invocations.
    
    Generated-by: Claude Code (Opus 4.7)
    
    * feat(spec-loop): freshness check + branch-name collision check
    
    Two pre-flight checks that close the gap that let an iteration re-do
    merged work under the same branch name as the already-shipped PR:
    
    - **Freshness check** (before `git checkout $BASE`): fetch the base's
      upstream tracking branch and refuse to fork if BASE is behind.
      Forking off a stale local main is how an agent re-creates a file
      that already exists upstream — the resulting branch then collides
      with the merged PR's source branch on the remote.
    
    - **Branch-name collision check** (after the agent forks its branch):
      walk every configured remote and warn if any already has a branch
      with the same name. Covers fork-based workflows where the merged
      PR's source branch lingers on `origin` and the canonical history
      lives on `upstream`.
    
    Both checks degrade gracefully when offline or when BASE has no
    configured upstream — they warn-and-skip rather than block.
    
    Generated-by: Claude Code (Opus 4.7)
---
 .pre-commit-config.yaml                           |   6 +-
 AGENTS.md                                         |  15 +-
 docs/spec-driven-development.md                   | 249 +++++++++++++++
 tools/spec-loop/AGENTS.md                         |  78 +++++
 tools/spec-loop/IMPLEMENTATION_PLAN.md            |  95 ++++++
 tools/spec-loop/PROMPT_build.md                   |  76 +++++
 tools/spec-loop/PROMPT_consolidate.md             |  28 ++
 tools/spec-loop/PROMPT_plan.md                    |  44 +++
 tools/spec-loop/PROMPT_update.md                  |  59 ++++
 tools/spec-loop/README.md                         |  87 +++++
 tools/spec-loop/loop.sh                           | 373 ++++++++++++++++++++++
 tools/spec-loop/specs/README.md                   |  74 +++++
 tools/spec-loop/specs/adapters.md                 |  77 +++++
 tools/spec-loop/specs/adoption-and-setup.md       |  77 +++++
 tools/spec-loop/specs/agent-isolation-sandbox.md  |  84 +++++
 tools/spec-loop/specs/cve-tooling.md              |  74 +++++
 tools/spec-loop/specs/drafting-mode.md            |  73 +++++
 tools/spec-loop/specs/mentoring-mode.md           |  75 +++++
 tools/spec-loop/specs/meta-and-quality-tooling.md |  81 +++++
 tools/spec-loop/specs/overview.md                 |  68 ++++
 tools/spec-loop/specs/pairing-mode.md             |  72 +++++
 tools/spec-loop/specs/privacy-llm-gate.md         |  84 +++++
 tools/spec-loop/specs/security-issue-lifecycle.md |  76 +++++
 tools/spec-loop/specs/triage-mode.md              |  74 +++++
 24 files changed, 2096 insertions(+), 3 deletions(-)

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 887e9af..589361b 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -42,11 +42,13 @@ repos:
         # Skip agent-facing skill definitions: their YAML frontmatter must be
         # the first thing in the file, which is incompatible with doctoc
         # inserting a TOC block at the top. Also skip skill-evals fixture and
-        # README files — short, single-purpose docs that don't warrant a TOC.
+        # README files, and the spec-loop specs/prompts (YAML-frontmatter
+        # specs and short single-purpose prompts, same rationale as skills)
+        # — short docs that don't warrant a TOC.
         # Skip the PR template — GitHub pre-populates a new PR description
         # with the template verbatim, so a TOC block becomes per-PR noise the
         # contributor has to delete by hand.
-        exclude: 
^(\.claude/skills/.*|tools/vulnogram/generate-cve-json/SKILL\.md|tools/skill-evals/.*|\.github/PULL_REQUEST_TEMPLATE\.md)$
+        exclude: 
^(\.claude/skills/.*|tools/vulnogram/generate-cve-json/SKILL\.md|tools/skill-evals/.*|tools/spec-loop/.*|\.github/PULL_REQUEST_TEMPLATE\.md)$
         args:
           - "--maxlevel"
           - "3"
diff --git a/AGENTS.md b/AGENTS.md
index 8fede15..af00e3f 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1389,7 +1389,20 @@ When adding a new skill:
 - start with YAML frontmatter containing `name`, `description`, and 
`when_to_use`;
 - make every state-changing action a *proposal* that requires explicit user
   confirmation before it runs;
-- avoid agent-specific syntax so the skill remains portable across tools.
+- avoid agent-specific syntax so the skill remains portable across tools;
+- **write an eval suite for the skill.** Every skill ships with a
+  behavioural eval under
+  [`tools/skill-evals/evals/<skill-name>/`](tools/skill-evals/) that
+  exercises each pipeline step with fixture cases and pins the expected
+  structured output — the same shape as the existing suites
+  (`security-issue-import`, `issue-triage`, …). Evals are not optional
+  polish: an LLM-driven skill's behaviour is a distribution, and the eval
+  is how a reviewer (and CI) confirms a change did not regress it. Run a
+  suite with
+  `uv run --project tools/skill-evals skill-eval 
tools/skill-evals/evals/<skill-name>/`;
+  see [`tools/skill-evals/README.md`](tools/skill-evals/README.md) for the
+  layout (`step-*/fixtures/case-*`). A skill PR without a matching eval
+  suite is incomplete.
 
 ## Keeping evals and mode-economics in sync
 
diff --git a/docs/spec-driven-development.md b/docs/spec-driven-development.md
new file mode 100644
index 0000000..b69f706
--- /dev/null
+++ b/docs/spec-driven-development.md
@@ -0,0 +1,249 @@
+<!-- START doctoc generated TOC please keep comment here to allow auto update 
-->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with 
[DocToc](https://github.com/thlorenz/doctoc)*
+
+- [Spec-driven development](#spec-driven-development)
+  - [The idea](#the-idea)
+  - [Three phases, four beats, one loop](#three-phases-four-beats-one-loop)
+  - [Specs are not RFCs](#specs-are-not-rfcs)
+  - [A branch per fix or feature](#a-branch-per-fix-or-feature)
+  - [Why it never pushes](#why-it-never-pushes)
+  - [Security and the dangerously-skip-permissions 
flag](#security-and-the-dangerously-skip-permissions-flag)
+  - [Keeping specs honest: the update 
beat](#keeping-specs-honest-the-update-beat)
+  - [Layout](#layout)
+  - [Quick start](#quick-start)
+  - [How this composes with the framework's 
principles](#how-this-composes-with-the-frameworks-principles)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# Spec-driven development
+
+This document explains the **spec-driven build loop** that lives in
+[`tools/spec-loop/`](../tools/spec-loop/). It is how this framework can be
+developed and maintained against a written description of what it does,
+rather than against whatever happens to be in someone's head on a given
+afternoon.
+
+## The idea
+
+The loop is a small instance of the general
+[Ralph](https://ghuntley.com/ralph/) technique: run a fresh agent context
+against a fixed prompt, let it do one well-scoped thing, then run it
+again with a clean context. The power is in the funnel that feeds it —
+not "a loop that codes," but a pipeline from *what the product should do*
+to *one reviewable change at a time*.
+
+Two artefacts carry the state between iterations, so nothing depends on a
+long-lived context window:
+
+- **Specs** ([`tools/spec-loop/specs/`](../tools/spec-loop/specs/)) — a
+  faithful, plain-Markdown description of each functional area of the
+  product: what it does, where it lives in the code, the contract it must
+  honour, and its known gaps. The specs are the *desired state*.
+- **The implementation plan**
+  
([`tools/spec-loop/IMPLEMENTATION_PLAN.md`](../tools/spec-loop/IMPLEMENTATION_PLAN.md))
+  — the prioritised list of *work items*: the gaps between the specs and
+  the code. Each work item is sized to one branch and one PR.
+
+## Three phases, four beats, one loop
+
+Phase one is a human (or a planning conversation) writing the specs.
+Phases two and three are the loop, swapping prompts:
+
+| Beat | Command | What it does | Commits? |
+|---|---|---|---|
+| **plan** | `loop.sh plan` | Compares specs against the code; rewrites the 
plan as prioritised work items. | No |
+| **build** | `loop.sh` | Implements the single highest-priority work item on 
its own branch; validates; commits there. | Yes, on a work-item branch |
+| **update** | `loop.sh update` | Inverse of plan: finds functionality the 
code has but the specs don't, and brings the specs back in sync. | Yes, on 
`spec/sync-specs` |
+| **consolidate** | `loop.sh consolidate` | Shrinks the plan when it grows too 
long, without dropping planned work. | Yes, the plan only |
+
+Every beat loads the same operational context —
+[`tools/spec-loop/AGENTS.md`](../tools/spec-loop/AGENTS.md) (repo map,
+validation commands, branch and hard-limit rules) layered on top of the
+repository-wide [`AGENTS.md`](../AGENTS.md). Each build iteration reads
+only the spec and source files relevant to its one work item, so a fresh
+context never drowns in the whole tree.
+
+## Specs are not RFCs
+
+The framework already has [`docs/rfcs/`](rfcs/) — normative principle
+documents that say *why* and define the trust posture. Those are the
+**constitution**. The specs are the **work orders**: discrete, concrete,
+grounded in the actual code. The loop **respects** an RFC principle as a
+constraint (it never pushes, it stays in the sandbox), but it never reads,
+edits, or restates an RFC, and no spec ever lives under `docs/rfcs/`. The
+two formats and lifecycles are deliberately separate; see
+[`tools/spec-loop/specs/README.md`](../tools/spec-loop/specs/README.md).
+
+Spec filenames are **topics, not numbers** — `triage-mode.md`,
+`pairing-mode.md`, `security-issue-lifecycle.md`. There is no numeric
+prefix, because numbering implies a priority the specs don't carry.
+Priority lives only in the implementation plan.
+
+## A branch per fix or feature
+
+The defining constraint of this loop: **one work item, one branch, one
+PR**. Before plan/build iterations, the runner snapshots open PRs so the
+plan beat does not add work already in flight and the build beat skips
+planned items that an open PR already covers. The build beat returns to
+the integration base, then carves out a
+`spec/<slug>` branch for the single work item it is about to implement. It
+never commits feature work to the base branch, and `loop.sh` stops if it
+detects that happening. The result of a run is a fan of independent
+branches, each carrying exactly one change — each independently
+reviewable and independently revertible.
+
+This is the same discipline the framework asks of every state change,
+applied to the framework's own development. It is also what makes the
+loop safe to run unattended: the blast radius of any one iteration is a
+single local branch.
+
+## Why it never pushes
+
+`git push` and `gh pr create` are in the `ask` list of
+[`.claude/settings.json`](../.claude/settings.json) — they require a human
+confirmation. The loop honours that: it ends every iteration at a **local
+commit** and prints the exact commands for the human to run:
+
+```bash
+git push -u origin spec/<slug>
+gh pr create --web --base main --head spec/<slug> \
+  --title "<subject>" --body-file <prepared-body>
+```
+
+Opening the PR with `--web` is the framework's convention so the reviewer
+sees the title, body, and generative-AI disclosure in the browser before
+submitting. The agent drafts; the human presses the button.
+
+## Security and the dangerously-skip-permissions flag
+
+The loop runs the agent headless with `--dangerously-skip-permissions`.
+That deserves a direct explanation, because it looks, at a glance, like
+it throws away the framework's permission gates.
+
+**Why the flag is there.** Headless iterations have no human to answer a
+per-tool-call prompt. Without the flag, the agent would stall (or, in
+non-interactive mode, deny) the moment it tried to edit a file or run a
+validation command. The flag lets the loop do its job — edit, validate,
+commit — unattended.
+
+**What it bypasses, and what it does not.** The framework's sandbox is
+layered (see [`docs/rfcs/RFC-AI-0004.md`](rfcs/RFC-AI-0004.md) for the
+normative statement and `docs/setup/secure-agent-internals.md` for the
+mechanism). `--dangerously-skip-permissions` only reaches the top two:
+
+| Layer | Mechanism | Bypassed by the flag? |
+|---|---|---|
+| 0. Clean environment | wrapper strips credential-shaped env vars before exec 
| **No** — it is the launching wrapper, not the agent |
+| 1. Filesystem + network sandbox | `bubblewrap` + SNI proxy (Linux) / 
`sandbox-exec` (macOS); default-deny egress | **No** — enforced by the OS, not 
the agent |
+| 2. Tool permissions | `.claude/settings.json` `permissions.deny` | **Yes** |
+| 3. Forced confirmation | `.claude/settings.json` `permissions.ask` on `git 
push`, `gh …` | **Yes** |
+
+So the flag removes the *agent-level* gate (Layers 2–3), but the
+*OS-level* boundary (Layers 0–1) is untouched — it is enforced beneath
+the agent and cannot be turned off from inside it. This is exactly the
+posture the flag's own guidance assumes: it is *"recommended only for
+sandboxes with no internet access."*
+
+**How the loop stays safe anyway.** Three things, in order of
+importance:
+
+1. **Run it only inside the sandbox harness.** The OS layers the flag
+   cannot bypass are the real boundary. Never run the loop on a bare
+   machine — launch it through the project's `claude-iso`/sandbox wrapper
+   so the filesystem and network allow-lists are in force.
+2. **Run it with no push/write credentials in the environment.** The
+   clean-env wrapper already strips them; keep it that way. `github.com`
+   is on the network allow-list, but a `git push` or `gh pr create` with
+   no token cannot authenticate, so it fails closed. As defence in depth
+   the loop also passes
+   `--disallowedTools "Bash(git push *)" "Bash(gh *)"`.
+3. **Structural containment.** Every iteration works on its own
+   `spec/<slug>` branch, the loop guards against commits landing on the
+   base branch, and the prompts forbid push/PR. The human-in-the-loop
+   gate is not removed — it is *relocated* from per-tool-call to the
+   push / PR / merge boundary, where the human reviews a finished branch.
+
+**Net effect.** During a run the per-call confirmation gate is traded for
+autonomy, but credentials are absent, egress is fenced, and the blast
+radius of any iteration is a single local branch the human has not yet
+pushed. That is the same reason the loop is the project's *manual-loop
+evidence* and must never be promoted to auto-merge: the autonomy is
+bounded to producing local branches, nothing more. An operator who wants
+the per-call gate back can drop the flag and pre-authorise the loop's
+tools with `--allowedTools` instead — at the cost of the loop pausing on
+anything it was not pre-authorised to do.
+
+## Keeping specs honest: the update beat
+
+Not every contribution comes through the loop — people land new skills and
+tools the normal way. When that happens the specs fall behind the code.
+The **update** beat is the fix: it inventories `.claude/skills/`,
+`tools/`, and `docs/modes.md`, diffs that against the specs, and back-fills
+or corrects the specs (a `proposed` area that now has a shipped skill
+becomes `experimental`; a drifted *Where it lives* is corrected; genuinely
+new functionality gets a new topic-named spec). It edits **only** the spec
+directory — it documents reality, it doesn't change it — and lands as one
+reviewable `spec/sync-specs` PR. Run it after a batch of normal PRs merges,
+or on a schedule.
+
+## Layout
+
+```text
+tools/spec-loop/
+├── README.md              operator quickstart
+├── AGENTS.md              loop-scoped operational context
+├── loop.sh                the runner (plan / build / update / consolidate)
+├── PROMPT_plan.md         gap analysis → plan
+├── PROMPT_build.md        implement one work item on its own branch
+├── PROMPT_update.md       back-fill specs from contributed code
+├── PROMPT_consolidate.md  shrink the plan
+├── IMPLEMENTATION_PLAN.md prioritised work items (the gaps)
+└── specs/                 functional description of the product
+    ├── overview.md
+    ├── triage-mode.md     mentoring-mode.md   drafting-mode.md   
pairing-mode.md
+    ├── security-issue-lifecycle.md            privacy-llm-gate.md
+    ├── agent-isolation-sandbox.md             cve-tooling.md
+    ├── adoption-and-setup.md                  adapters.md
+    └── meta-and-quality-tooling.md
+```
+
+## Quick start
+
+```bash
+# 1. See what's out of sync, then read the plan it writes.
+./tools/spec-loop/loop.sh plan 1
+$EDITOR tools/spec-loop/IMPLEMENTATION_PLAN.md
+
+# 2. Build the top work item (one branch, one commit) and stop.
+./tools/spec-loop/loop.sh 1
+
+# 3. Review the branch it produced, then push + open the PR yourself.
+git log --oneline -1
+git push -u origin spec/<slug>
+gh pr create --web --base main --head spec/<slug> --title "…" --body-file …
+
+# Later: someone merged skills outside the loop — resync the specs.
+./tools/spec-loop/loop.sh update 1
+```
+
+Stop any run with `Ctrl+C` or `touch STOP`. By default the loop forks
+work items from the branch you start it on (typically `main`); set
+`SPEC_LOOP_BASE` to build on top of a different branch. Set
+`SPEC_LOOP_AGENT` when the Claude-compatible agent CLI is installed
+under a command name other than `claude`. Set `SPEC_LOOP_PR_LIMIT` to
+change how many open PRs are included in the duplicate-work check.
+
+## How this composes with the framework's principles
+
+A loop that runs an agent unattended sounds, at first, like the opposite
+of human-in-the-loop. The branch-per-feature constraint is the
+reconciliation: the loop's autonomy is bounded to *producing local
+branches*, and the human gate sits exactly where the framework always
+puts it — at push, at PR, at merge. Nothing the loop does is visible
+outside the maintainer's machine until a human chooses to push it. The
+loop is the *manual* development cycle the framework can later point to as
+evidence; it is not, and must not become, an auto-merge.
diff --git a/tools/spec-loop/AGENTS.md b/tools/spec-loop/AGENTS.md
new file mode 100644
index 0000000..7a839b8
--- /dev/null
+++ b/tools/spec-loop/AGENTS.md
@@ -0,0 +1,78 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# AGENTS — spec-loop operational context
+
+This file is the **operational** context for the spec-loop only:
+build/validate commands, the repository map, and the branch rules. It is
+loaded by the loop's prompts in addition to the repository-wide
+[`/AGENTS.md`](../../AGENTS.md), which still governs everything (commit
+trailers, placeholder convention, privacy/security posture). Where the
+two overlap, the repo-wide `AGENTS.md` wins.
+
+## Repository map (what the loop edits)
+
+This repo has no `src/` tree. Work lands in one of:
+
+- `.claude/skills/<name>/SKILL.md` — agent-readable skills (Markdown +
+  YAML frontmatter; required keys `name`, `description`, `license`).
+- `tools/<tool>/` — deterministic Python tools (`uv`, hatchling,
+  `src/` + `tests/`, `dependencies = []` where possible).
+- `docs/` — human-facing documentation. `docs/rfcs/` is the **separate**
+  governance layer — the loop never edits it.
+- `tools/spec-loop/specs/` — the specs this loop consumes.
+
+## Validation commands (the build "backpressure" step)
+
+Run the spec's own **Validation** block first. General checks:
+
+```bash
+# Validate skill definitions (frontmatter, links, placeholders)
+uv run --project tools/skill-validator --group dev skill-validate
+
+# A skill's behavioural eval suite (every skill must have one)
+uv run --project tools/skill-evals skill-eval 
tools/skill-evals/evals/<skill-name>/
+
+# A tool's own tests (substitute the tool path)
+uv run --project tools/<tool> --group dev pytest
+
+# Shell scripts
+bash -n <script>.sh && shellcheck <script>.sh
+```
+
+There is no repo-wide test runner; validate the specific surface the
+spec touches. If a work item adds or changes a **skill**, it must also
+add/extend that skill's eval suite under
+`tools/skill-evals/evals/<skill-name>/` (per `/AGENTS.md` § Reusable
+skills — a skill without an eval suite is incomplete). If a work item
+adds a **tool**, that tool ships its own tests. Both must pass before
+commit.
+
+## Branch rules (the user's constraint: one branch per fix/feature)
+
+- **Never commit feature work to the integration branch.** Build mode
+  branches `<slug>` off the integration branch (`$SPEC_LOOP_BASE`,
+  default: `main`) first.
+- **One spec per branch, one branch per PR.** Do not bundle specs.
+- A feature branch edits only **its own** spec's `status:` (→ `done`) —
+  not sibling specs and not `IMPLEMENTATION_PLAN.md` (avoids cross-branch
+  conflicts; the plan is reconciled by a later `plan` pass).
+- The **`update`** beat (specs fell behind code others contributed)
+  branches `sync-specs` and edits `specs/` **only** — it documents
+  reality, it never changes a skill, tool, or doc outside the spec dir.
+
+## Hard limits (governance — do not cross)
+
+- **No push, no PR.** `git push` and `gh pr create` are in the `ask`
+  list of `.claude/settings.json`. The loop stops at a local commit and
+  prints the human-run commands. Opening the PR is the human's click.
+- **No `.claude/settings.json` edits** (it is in the `deny` list).
+- **No new network/filesystem allowances.** Run inside the existing
+  sandbox.
+
+## Commits
+
+- Imperative subject describing the user-visible change.
+- Trailer `Generated-by: Claude (Opus 4.7)` — **never** `Co-Authored-By`
+  with an agent (repo-wide `AGENTS.md` § Commit and PR conventions).
+- One commit per build iteration (the change + its spec `status` flip).
diff --git a/tools/spec-loop/IMPLEMENTATION_PLAN.md 
b/tools/spec-loop/IMPLEMENTATION_PLAN.md
new file mode 100644
index 0000000..db69e0a
--- /dev/null
+++ b/tools/spec-loop/IMPLEMENTATION_PLAN.md
@@ -0,0 +1,95 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# Implementation Plan — spec-loop
+
+Maintained by the loop's **plan** mode. It is the prioritised list of
+*gaps* found by comparing [`specs/`](specs/) against the actual code
+(`.claude/skills/`, `tools/`, `docs/`). The **build** mode takes the
+single highest-priority work item, isolates it on its own branch,
+implements it, validates it, and commits — **one work item, one branch,
+one PR** (the branch-per-feature constraint).
+
+> Priority lives here, not in the specs. The specs describe functional
+> areas (unordered); this plan orders the work.
+
+---
+
+## What's been built
+
+- **Spec set** — [`specs/`](specs/): an `overview` plus a functional
+  spec per area (the four live modes, the security lifecycle, the
+  privacy-LLM gate, the sandbox, CVE tooling, adoption/setup, adapters,
+  and meta/quality tooling).
+- **Loop scaffolding** — `loop.sh` (plan / build / consolidate; a branch
+  per work item; never pushes), `PROMPT_plan.md`, `PROMPT_build.md`,
+  `PROMPT_consolidate.md`, `AGENTS.md` (loop-scoped operational context),
+  and this plan.
+
+---
+
+## Work items (planned)
+
+Priority order. Each maps to one branch and one PR. Branch names are
+slugs, not numbers (numbering implies an order the specs don't carry).
+
+1. **Pairing — pre-flight self-review skill.** Highest priority: closes
+   the empty-Pairing-family gap MISSION makes a v1 goal. New
+   `.claude/skills/pairing-self-review/SKILL.md` (read-only, hands a
+   report back, never opens a PR); update `docs/modes.md` Pairing row
+   0 → 1, `proposed` → `experimental`. Validate with `skill-validate`.
+   Spec: [`specs/pairing-mode.md`](specs/pairing-mode.md). Branch
+   `pairing-self-review`.
+
+2. **Mentoring — first prototype skill.** `pr-management-mentor` (working
+   name), `mode: Mentoring` + `experimental`, drafting replies in a
+   teaching register with an explicit hand-off to a human. The Mentoring
+   spec/tone-guide already exists under `docs/mentoring/`. Spec:
+   [`specs/mentoring-mode.md`](specs/mentoring-mode.md). Branch
+   `mentoring-prototype`.
+
+3. **Docs — mode economics page.** New `docs/mode-economics.md` (per-mode
+   token-cost shape, vendor-neutral, indicative-not-a-quote), linked from
+   `docs/modes.md`. From MISSION § Affordability. Branch
+   `mode-economics-doc`.
+
+4. **Meta — spec-status index.** A deterministic `uv` tool (mirrors
+   `list-steward-skills`) that prints specs by status and a `--ready`
+   filter, so later build iterations choose the next work item
+   mechanically. Spec: 
[`specs/meta-and-quality-tooling.md`](specs/meta-and-quality-tooling.md).
+   Branch `spec-status-index`.
+
+5. **Pairing — multi-agent review pipeline.** Fans a local diff through
+   independent review passes (correctness / security / conventions) and
+   merges the findings. Reuses the self-review report format, so it
+   follows work item 1. Branch `pairing-multi-agent-review`.
+
+6. **Drafting — generic (non-security) drafting.** Extend Drafting beyond
+   the security + general-issue cases to lint fixes, audit-tool findings,
+   and documentation holes (MISSION names these in scope). Larger; split
+   into per-source work items as it is picked up. Spec:
+   [`specs/drafting-mode.md`](specs/drafting-mode.md). Branch
+   `generic-drafting`.
+
+7. **Meta — back-fill missing skill eval suites.** Per `/AGENTS.md`
+   § Reusable skills, every skill ships an eval suite under
+   `tools/skill-evals/evals/<skill-name>/`. Several skills predate that
+   convention and have none. Add one suite per uncovered skill — one
+   branch per skill (or per family). Spec:
+   [`specs/meta-and-quality-tooling.md`](specs/meta-and-quality-tooling.md).
+   Branch `eval-<skill-name>`.
+
+   Also: when a build iteration creates a new skill, its eval suite is
+   part of that same work item — not a separate one.
+
+---
+
+## Notes & discoveries
+
+- The general Ralph-loop technique pushes after every iteration. That
+  step is intentionally **removed** here: `git push` and `gh pr create`
+  are in the repo's `ask` permission list and are the human's step.
+- Validation per work item lives in the relevant spec's **Validation**
+  section; the build prompt runs it as backpressure before committing.
+- Auto-merge is deliberately off and has no work items — building toward
+  it would skip the proof MISSION requires.
diff --git a/tools/spec-loop/PROMPT_build.md b/tools/spec-loop/PROMPT_build.md
new file mode 100644
index 0000000..e612e37
--- /dev/null
+++ b/tools/spec-loop/PROMPT_build.md
@@ -0,0 +1,76 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+You are running the **build** beat of the spec-driven loop for this
+repository. Implement exactly ONE work item, on its OWN branch.
+
+Context to load first:
+
+- `tools/spec-loop/AGENTS.md` — operational rules (repo map, validation
+  commands, branch + hard-limit rules). The repo-wide `/AGENTS.md` also
+  applies (commit trailers, placeholder convention, confidentiality).
+- `tools/spec-loop/IMPLEMENTATION_PLAN.md` — the prioritised work items.
+- The appended **Open pull-request context** block from the runner.
+- Only the spec(s) and source files relevant to the chosen work item —
+  do not read the whole tree.
+
+Steps:
+
+1. Read the appended **Open pull-request context**. Treat open PRs as
+   in-flight work. Pick the single highest-priority work item from
+   `IMPLEMENTATION_PLAN.md`. If a **Tooling source** block is appended
+   below, read the plan from the control branch as it shows
+   (`git show <ref>:tools/spec-loop/IMPLEMENTATION_PLAN.md`), not from the
+   working tree — the tree is on the integration base, which need not carry
+   the plan. Pick an item not already substantially covered by an open PR.
+   One only.
+2. **Create its branch off the integration base**, then switch to it:
+   `git checkout -b <slug>` where `<slug>` is the work item's branch — the
+   bare slug, **no `spec/` or other prefix** (e.g.
+   `pairing-self-review`). NEVER commit the work to the integration
+   branch. One branch per work item.
+3. Read only the relevant spec file(s) — from the control branch if a
+   **Tooling source** block is appended, otherwise from the working tree —
+   plus the relevant `.claude/skills/` / `tools/` / `docs/` files from the
+   working tree. Confirm what already exists before writing — do not assume.
+4. Implement the work item **completely** — no placeholders, no stubs.
+   Skills: follow the skill format (frontmatter `name` / `description` /
+   `license`, SPDX header, placeholder convention, every state change a
+   confirmed proposal) **and ship an eval suite** under
+   `tools/skill-evals/evals/<skill-name>/` exercising each step with
+   fixture cases (per `/AGENTS.md` § Reusable skills — a skill without a
+   matching eval suite is incomplete). Tools: ship tests.
+5. Run the work item's **Validation** command(s) from its spec (the
+   backpressure). Fix until they pass.
+6. Specs and `IMPLEMENTATION_PLAN.md` live on the control branch. If a
+   **Tooling source** block is appended, they are **not** on this work
+   branch — do not create or edit them here; instead note any `status` or
+   `Known gap` change in the PR body for a later plan/update beat to
+   reconcile. (If no such block is present, the tooling is on this branch:
+   update **only that spec's** frontmatter/Known-gaps, and never
+   `IMPLEMENTATION_PLAN.md`.)
+7. `git add -A` then `git commit` with an imperative subject and a
+   `Generated-by: Claude (Opus 4.7)` trailer. **Never** add a
+   `Co-Authored-By:` trailer for an agent.
+
+Then STOP. Do NOT push and do NOT open a PR — `git push` and
+`gh pr create` are the human's step (they are in `.claude/settings.json`
+`ask`). Print the exact commands the human can run:
+
+```text
+git push -u origin <slug>
+gh pr create --web --base <integration-base> --head <slug> \
+  --title "<subject>" --body-file <prepared-body>
+```
+
+Rules:
+
+- One work item per iteration. Do not bundle.
+- Do not duplicate in-flight work from open PRs. If the highest-priority
+  plan item is covered by an open PR, skip it and choose the next
+  uncovered item.
+- If a work item is blocked, note why in its spec's `Known gaps` and pick
+  the next item instead.
+- Stay inside the sandbox; never edit `.claude/settings.json`; never add a
+  new network/filesystem allowance.
+- Single sources of truth — no duplicate logic; extend `tools/`.
diff --git a/tools/spec-loop/PROMPT_consolidate.md 
b/tools/spec-loop/PROMPT_consolidate.md
new file mode 100644
index 0000000..3eebe4d
--- /dev/null
+++ b/tools/spec-loop/PROMPT_consolidate.md
@@ -0,0 +1,28 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+`tools/spec-loop/IMPLEMENTATION_PLAN.md` has grown too long. Consolidate
+it without losing planned work.
+
+1. Read `tools/spec-loop/IMPLEMENTATION_PLAN.md` in full.
+2. In **What's been built**: collapse each completed item to a single
+   concise line. The detail lives in the code and specs.
+3. In **Work items (planned)**: keep every work item intact — these still
+   guide future build beats. Do not remove or shorten planned work.
+4. Remove redundant notes, stale caveats, and duplicates.
+5. Rewrite the file. Aim for **under 300 lines** — comfortably below the
+   consolidation trigger so the loop does not immediately re-consolidate.
+   Shrink by collapsing the *What's been built* section only; **every
+   planned work item is preserved**. If planned work alone still exceeds
+   300 lines, that is fine — do not pad, and never drop a work item to hit
+   the number.
+6. `git add -A` then
+   `git commit -m "chore(spec-loop): consolidate implementation plan"`
+   with a `Generated-by: Claude (Opus 4.7)` trailer.
+
+Rules:
+
+- Do not mark any planned work item as done.
+- Do not remove any planned work item.
+- Do not touch `tools/spec-loop/specs/` or any skill/tool/doc.
+- Commit only the plan file. Do not push or open a PR.
diff --git a/tools/spec-loop/PROMPT_plan.md b/tools/spec-loop/PROMPT_plan.md
new file mode 100644
index 0000000..afc4c57
--- /dev/null
+++ b/tools/spec-loop/PROMPT_plan.md
@@ -0,0 +1,44 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+You are running the **plan** beat of the spec-driven loop for this
+repository. Plan only — do NOT implement anything and do NOT commit code.
+
+Context to load first:
+
+- `tools/spec-loop/AGENTS.md` — operational rules (repo map, validation
+  commands, branch + hard-limit rules). The repo-wide `/AGENTS.md` also
+  applies.
+- `tools/spec-loop/specs/*` — the functional description of the product.
+- `tools/spec-loop/IMPLEMENTATION_PLAN.md` (if present; may be stale).
+- The appended **Open pull-request context** block from the runner.
+
+Steps:
+
+1. Study each spec in `tools/spec-loop/specs/` and compare it against the
+   actual code it names in **Where it lives** (`.claude/skills/`,
+   `tools/`, `docs/`). You may use parallel subagents for reading. Do NOT
+   assume something is missing — confirm with a code search first.
+2. Read the appended **Open pull-request context**. Treat open PRs as
+   in-flight work. If an apparent gap is already substantially covered by
+   an open PR (including draft PRs), do not add it as a planned work item.
+3. For each spec, identify the **gaps**: a `proposed` area with no skill,
+   a documented step that drifted from the code, a missing test, a
+   `Known gaps` item. Each gap is a candidate work item.
+4. Rewrite `tools/spec-loop/IMPLEMENTATION_PLAN.md` as a prioritised list
+   of work items. Each work item names: the change, the spec it serves,
+   its **Validation** command, and a branch slug (`<slug>`, the bare
+   slug — **no `spec/` or other prefix, no numbers**).
+5. Do NOT create work items against an `off` spec (e.g. Auto-merge) —
+   that would skip the proof MISSION requires.
+
+Rules:
+
+- Plan only. No edits to skills, tools, or docs. No commits in this beat.
+- Keep the plan prioritised and concise; one work item = one branch = one
+  PR.
+- Do not duplicate in-flight work from open PRs. If a stale existing plan
+  item is now covered by an open PR, remove it or mark it as in-flight
+  rather than leaving it available for the build beat.
+- Treat `tools/` as the standard library — prefer extending an existing
+  tool over a new ad-hoc one.
diff --git a/tools/spec-loop/PROMPT_update.md b/tools/spec-loop/PROMPT_update.md
new file mode 100644
index 0000000..be5c217
--- /dev/null
+++ b/tools/spec-loop/PROMPT_update.md
@@ -0,0 +1,59 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+You are running the **update** beat of the spec-driven loop. Specs can
+fall behind the code when contributors land functionality the normal
+way (a regular PR, not through this loop). This beat brings the specs
+back in sync with reality. It is the inverse of `plan`: `plan` finds code
+missing against specs; `update` finds **functionality missing against
+specs** and back-fills the specs.
+
+Context to load first:
+
+- `tools/spec-loop/AGENTS.md` and the repo-wide `/AGENTS.md`.
+- `tools/spec-loop/specs/*` — the current functional description.
+- The actual code: `.claude/skills/`, `tools/`, `docs/`, `docs/modes.md`.
+
+Steps:
+
+1. **Create the sync branch off the integration base**, then switch to
+   it: `git checkout -b sync-specs`. (One reviewable PR for the
+   sync.) Never commit the sync to the integration branch.
+2. Inventory the code with parallel subagents:
+   - every `.claude/skills/*/SKILL.md` (name, mode, what it does);
+   - every `tools/*` project (what it does, its tests);
+   - the mode/status table in `docs/modes.md`.
+3. Diff that inventory against `tools/spec-loop/specs/`:
+   - **New functionality with no spec** → author a new topic-named spec
+     (no number prefix) following the format in
+     [`specs/README.md`](specs/README.md), grounded in the real code it
+     describes.
+   - **Drifted spec** → a spec whose *Where it lives*, *Behaviour &
+     contract*, or `status` no longer matches the code → update it to
+     match reality (e.g. a `proposed` area that now has a shipped skill
+     becomes `experimental`/`stable`; skill counts in `docs/modes.md`
+     are reflected).
+   - **Removed functionality** → mark the spec or move it to a `Known
+     gaps`/retired note; do not silently delete history.
+4. Update `specs/overview.md` and `specs/README.md` indexes if areas were
+   added or renamed.
+5. `git add -A` then `git commit` with subject
+   `docs(spec-loop): sync specs with contributed functionality` and a
+   `Generated-by: Claude (Opus 4.7)` trailer.
+
+Then STOP. Do NOT push, do NOT open a PR. Print the human-run commands:
+
+```text
+git push -u origin sync-specs
+gh pr create --web --base <integration-base> --head sync-specs \
+  --title "Sync specs with contributed functionality" --body-file <body>
+```
+
+Rules:
+
+- **Edit specs only.** This beat changes `tools/spec-loop/specs/` (and
+  the indexes). It must NOT change any skill, tool, or doc outside the
+  spec directory — it documents reality, it does not alter it.
+- Confirm with a code search before recording something as present or
+  absent. Do not invent behaviour the code does not have.
+- Keep the RFCs untouched — they are a separate governance layer.
diff --git a/tools/spec-loop/README.md b/tools/spec-loop/README.md
new file mode 100644
index 0000000..8255be0
--- /dev/null
+++ b/tools/spec-loop/README.md
@@ -0,0 +1,87 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# spec-loop
+
+A spec-driven build loop for this framework, in the general
+[Ralph](https://ghuntley.com/ralph/) style (run a fresh agent context
+against a fixed prompt, repeat), adapted to the framework's
+human-in-the-loop posture. The full write-up is in
+[`docs/spec-driven-development.md`](../../docs/spec-driven-development.md);
+this is the operator quickstart.
+
+## The pieces
+
+| File | Role |
+|---|---|
+| [`specs/`](specs/) | The functional description of the product — one spec 
per area. The desired state the loop reconciles code against. |
+| [`IMPLEMENTATION_PLAN.md`](IMPLEMENTATION_PLAN.md) | Prioritised **work 
items** (the gaps). One work item = one branch = one PR. |
+| [`AGENTS.md`](AGENTS.md) | Loop-scoped operational rules (repo map, 
validation commands, branch + hard-limit rules). |
+| `PROMPT_plan.md` / `PROMPT_build.md` / `PROMPT_update.md` / 
`PROMPT_consolidate.md` | The per-beat prompts. |
+| `loop.sh` | The runner. |
+
+## Modes
+
+```bash
+./tools/spec-loop/loop.sh              # build, unlimited iterations
+./tools/spec-loop/loop.sh 10           # build, max 10 iterations
+./tools/spec-loop/loop.sh plan         # gap-analysis → rewrite the plan (no 
code changes)
+./tools/spec-loop/loop.sh update       # back-fill specs from functionality 
others contributed
+./tools/spec-loop/loop.sh consolidate  # shrink the plan when it grows too long
+```
+
+- **plan** — compares `specs/` against the code and rewrites
+  `IMPLEMENTATION_PLAN.md`. Plans only; no commits. It also checks open
+  PRs and does not add work items that are already in flight.
+- **build** — implements the single highest-priority work item on its own
+  `<slug>` branch, validates, and commits there. If the top plan
+  item is already covered by an open PR, it skips to the next uncovered
+  item.
+- **update** — the inverse of plan: scans the code for functionality not
+  yet described by a spec (someone contributed it the normal way) and
+  brings the specs back in sync, on a `sync-specs` branch.
+- **consolidate** — shrinks the plan without losing planned work (build
+  auto-switches to this when the plan grows past ~500 lines).
+
+## The two non-negotiables
+
+- **A branch per work item.** Build/update never commit to the
+  integration base; each carves out its own branch, so every change is
+  one reviewable, revertible PR.
+- **Never pushes, never opens a PR.** `git push` and `gh pr create` are in
+  `.claude/settings.json` `ask` — the human's step. Each beat ends at a
+  local commit and prints the exact push + `gh pr create --web` commands.
+
+## Security
+
+The loop runs the agent with `--dangerously-skip-permissions`, so it
+**must** be launched inside the project's sandbox harness, with no
+push/write credentials in the environment. The flag bypasses the agent
+permission layer (`.claude/settings.json` deny/ask) but **not** the OS
+sandbox (clean-env + filesystem/network), which stays the real boundary;
+as defence in depth the loop also hard-denies `git push` and `gh` via
+`--disallowedTools`. Full rationale:
+[`docs/spec-driven-development.md` § Security and the 
dangerously-skip-permissions 
flag](../../docs/spec-driven-development.md#security-and-the-dangerously-skip-permissions-flag).
+
+## Stop / configure
+
+- Stop: `Ctrl+C`, or `touch STOP` (exits after the current iteration).
+- `SPEC_LOOP_BASE` — branch to fork work items from. Defaults to `main`;
+  set it explicitly to build on top of a different branch.
+- `SPEC_LOOP_AGENT` — Claude-compatible agent CLI or wrapper to run
+  (default `claude`).
+- `SPEC_LOOP_MODEL` — model passed to the agent CLI (default `sonnet`).
+- `SPEC_LOOP_PR_LIMIT` — number of open PRs to include in duplicate-work
+  checks (default `100`).
+- `SPEC_LOOP_PLAN_MAX` — plan line count that triggers one consolidation
+  round before building (default `500`). The consolidate beat targets
+  ~300 lines (hysteresis) and runs at most once until the plan drops back
+  under the limit, so a plan that is long because of *pending work* never
+  re-consolidates in a loop.
+
+## Not the RFCs
+
+The specs are the *functional description of the code*. The
+[`docs/rfcs/`](../../docs/rfcs/) are the separate normative governance
+layer — the loop respects them as constraints and never reads or edits
+them.
diff --git a/tools/spec-loop/loop.sh b/tools/spec-loop/loop.sh
new file mode 100755
index 0000000..8e97a8c
--- /dev/null
+++ b/tools/spec-loop/loop.sh
@@ -0,0 +1,373 @@
+#!/usr/bin/env bash
+# SPDX-License-Identifier: Apache-2.0
+#   https://www.apache.org/licenses/LICENSE-2.0
+#
+# Spec-driven build loop for this repository.
+#
+# A small loop in the general "Ralph" style (run a fresh agent context
+# against a prompt, repeat), adapted to this framework's posture:
+#
+#   * THREE modes, ONE mechanism:
+#       ./loop.sh                 build, unlimited iterations
+#       ./loop.sh 20              build, max 20 iterations
+#       ./loop.sh plan [N]        gap-analysis only (updates the plan)
+#       ./loop.sh update [N]      back-fill specs from code others contributed
+#       ./loop.sh consolidate [N] shrink the plan
+#   * BRANCH PER WORK ITEM: before each build iteration the loop returns
+#     to the integration base; the build prompt then carves out
+#     <slug> for the one work item it implements. One work item =
+#     one branch = one PR.
+#   * NEVER pushes, NEVER opens a PR. `git push` / `gh pr create` are in
+#     .claude/settings.json `ask` — they are the human's step. The loop
+#     ends at a local commit and the build prompt prints the human-run
+#     push + `gh pr create --web` commands.
+#
+# SECURITY — read before running:
+#   This loop runs the agent with `--dangerously-skip-permissions`, which
+#   bypasses the AGENT permission layer (.claude/settings.json deny/ask)
+#   but NOT the OS sandbox (clean-env + filesystem/network). Per the
+#   project's security model it MUST be launched inside the sandbox
+#   harness, with no push/write credentials in the environment. Full
+#   rationale: docs/spec-driven-development.md § Security and the
+#   dangerously-skip-permissions flag.
+#
+# Stop gracefully: press Ctrl+C, or `touch STOP` (exits after the current
+# iteration finishes).
+#
+# Env overrides:
+#   SPEC_LOOP_BASE   branch to fork work items from (default: main)
+#   SPEC_LOOP_AGENT  Claude-compatible agent CLI to run (default: claude)
+#   SPEC_LOOP_MODEL  model passed to the agent CLI (default: sonnet)
+#   SPEC_LOOP_PR_LIMIT  open PRs to list for duplicate-work checks (default: 
100)
+#   SPEC_LOOP_PLAN_MAX  plan line count that triggers ONE consolidation
+#                    round before building (default: 500)
+
+set -uo pipefail
+
+trap 'echo -e "\nStopping loop..."; exit 0' INT TERM
+
+# Operate from the repo root so the agent sees the whole tree.
+ROOT="$(git rev-parse --show-toplevel 2>/dev/null)" || {
+    echo "Error: not inside a git repository." >&2; exit 1; }
+cd "$ROOT" || exit 1
+
+LOOP_DIR="tools/spec-loop"
+PLAN="$LOOP_DIR/IMPLEMENTATION_PLAN.md"
+
+current_branch() {
+    local branch
+    branch="$(git rev-parse --abbrev-ref HEAD 2>/dev/null)" || return 1
+    if [ "$branch" = "HEAD" ]; then
+        return 0
+    fi
+    printf '%s\n' "$branch"
+}
+
+# Default the integration base to main — the repo's integration target —
+# regardless of which branch the loop is launched from. Work items fork
+# from here. Override with SPEC_LOOP_BASE to build on top of a different
+# branch.
+BASE="${SPEC_LOOP_BASE:-main}"
+# The control branch: where loop.sh, the prompts, the plan and the specs
+# live. Captured now, before the loop checks anything out, because a
+# build/update iteration checks out BASE (e.g. main) — which need not carry
+# the spec-loop tooling. Every tooling read below goes through this ref via
+# `git show`, so the loop works when launched from a feature branch while
+# building on main.
+TOOLING_REF="$(current_branch)"
+TOOLING_REF="${TOOLING_REF:-HEAD}"
+AGENT="${SPEC_LOOP_AGENT:-claude}"
+MODEL="${SPEC_LOOP_MODEL:-sonnet}"
+PR_LIMIT="${SPEC_LOOP_PR_LIMIT:-100}"
+# Plan length that triggers ONE consolidation round before building. The
+# consolidate beat preserves every planned work item, so a plan that is long
+# because of *pending work* (not stale history) cannot shrink below this —
+# hence the one-shot latch below, which avoids re-consolidating forever.
+PLAN_CONSOLIDATE_THRESHOLD="${SPEC_LOOP_PLAN_MAX:-500}"
+
+# ---- parse arguments -------------------------------------------------
+if [ "${1:-}" = "plan" ]; then
+    MODE="plan";        PROMPT_FILE="$LOOP_DIR/PROMPT_plan.md";        
MAX_ITERATIONS="${2:-0}"
+elif [ "${1:-}" = "update" ]; then
+    MODE="update";      PROMPT_FILE="$LOOP_DIR/PROMPT_update.md";      
MAX_ITERATIONS="${2:-0}"
+elif [ "${1:-}" = "consolidate" ]; then
+    MODE="consolidate"; PROMPT_FILE="$LOOP_DIR/PROMPT_consolidate.md"; 
MAX_ITERATIONS="${2:-0}"
+elif [[ "${1:-}" =~ ^[0-9]+$ ]]; then
+    MODE="build";       PROMPT_FILE="$LOOP_DIR/PROMPT_build.md";       
MAX_ITERATIONS="$1"
+else
+    MODE="build";       PROMPT_FILE="$LOOP_DIR/PROMPT_build.md";       
MAX_ITERATIONS=0
+fi
+
+# Reject a non-numeric iteration count. The plan/update/consolidate second
+# argument flows straight into the integer comparisons below, where a typo'd
+# value would otherwise error to stderr and be silently treated as 0 — i.e.
+# run unbounded instead of failing.
+if ! [[ "$MAX_ITERATIONS" =~ ^[0-9]+$ ]]; then
+    echo "Error: iteration count must be a non-negative integer, got 
'${MAX_ITERATIONS}'." >&2
+    exit 1
+fi
+
+[ -f "$PROMPT_FILE" ] || { echo "Error: $PROMPT_FILE not found" >&2; exit 1; }
+
+if ! command -v "$AGENT" >/dev/null 2>&1; then
+    echo "Error: agent CLI '$AGENT' not found on PATH." >&2
+    echo "Set SPEC_LOOP_AGENT to a Claude-compatible CLI or wrapper." >&2
+    exit 1
+fi
+
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+echo "Mode:   $MODE"
+echo "Prompt: $PROMPT_FILE"
+echo "Base:   $BASE  (work items fork from here)"
+echo "Agent:  $AGENT"
+echo "Model:  $MODEL"
+[ "$MAX_ITERATIONS" -gt 0 ] && echo "Max:    $MAX_ITERATIONS iterations"
+echo "Stop:   Ctrl+C  or  touch STOP"
+echo "Note:   this loop never pushes and never opens a PR."
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+
+rm -f STOP
+ITERATION=0
+CONSOLIDATE_TRIED=false   # one-shot latch; resets when the plan drops back 
under the limit
+
+# spinner during silent agent calls
+spinner() {
+    local pid=$1 i=0
+    # Index frames as array elements, not string offsets: ${frames:$i:1} is
+    # byte-based under a C/POSIX locale (common in the clean-env sandbox) and
+    # would slice a multibyte braille glyph into garbage.
+    local frames=(⠋ ⠙ ⠹ ⠸ ⠼ ⠴ ⠦ ⠧ ⠇ ⠏)
+    while kill -0 "$pid" 2>/dev/null; do
+        printf "\r  %s  working... (Ctrl+C to stop)" "${frames[i]}"
+        i=$(( (i+1) % ${#frames[@]} )); sleep 0.15
+    done
+    printf "\r                                              \r"
+}
+
+open_pr_context() {
+    echo ""
+    echo "## Open pull-request context"
+    echo ""
+    echo "The runner collected this immediately before the iteration. Treat 
open"
+    echo "pull requests as in-flight work: do not add plan items that are 
already"
+    echo "substantially covered by an open PR, and do not pick a build item 
that"
+    echo "duplicates one."
+    echo ""
+
+    if ! command -v gh >/dev/null 2>&1; then
+        echo "- unavailable: gh CLI not found on PATH."
+        return 0
+    fi
+
+    local prs
+    prs="$(gh pr list \
+        --state open \
+        --limit "$PR_LIMIT" \
+        --json number,title,headRefName,baseRefName,url,isDraft \
+        --template '{{range .}}- #{{.number}} {{.title}} ({{.headRefName}} -> 
{{.baseRefName}}){{if .isDraft}} [draft]{{end}} {{.url}}{{"\n"}}{{end}}' \
+        2>/dev/null)"
+    if [ $? -ne 0 ]; then
+        echo "- unavailable: gh pr list failed. Check GitHub authentication or 
network access."
+    elif [ -z "$prs" ]; then
+        echo "- No open pull requests found."
+    else
+        printf '%s\n' "$prs"
+    fi
+}
+
+while true; do
+    if [ "$MAX_ITERATIONS" -gt 0 ] && [ "$ITERATION" -ge "$MAX_ITERATIONS" ]; 
then
+        echo "Reached max iterations: $MAX_ITERATIONS"; break
+    fi
+    if [ -f STOP ]; then echo "STOP file detected — exiting."; rm -f STOP; 
break; fi
+
+    ITERATION=$((ITERATION + 1))
+    echo ""
+    echo "━━━━━━━━━━━━━━━━━━━━ LOOP $ITERATION  [ $(date '+%H:%M:%S') ] 
━━━━━━━━━━━━━━━━━━━━"
+
+    ACTIVE_PROMPT="$PROMPT_FILE"
+
+    if [ "$MODE" = "build" ]; then
+        # Consolidate at most ONCE when the plan grows too long, then build
+        # even if it is still over: the remaining length is planned work
+        # items, which the consolidate beat preserves by design. The latch
+        # resets once the plan drops back under the limit (e.g. after items
+        # merge and a plan pass prunes them), so we never re-consolidate in a
+        # loop without making progress. Prefer the working-tree plan (so
+        # local edits count); fall back to the control branch ($TOOLING_REF)
+        # if the tree is on a base (e.g. main) that lacks the spec-loop 
tooling.
+        PLAN_LINES=$( { [ -f "$PLAN" ] && cat "$PLAN" || git show 
"$TOOLING_REF:$PLAN" 2>/dev/null; } | wc -l )
+        if [ "$PLAN_LINES" -le "$PLAN_CONSOLIDATE_THRESHOLD" ]; then
+            CONSOLIDATE_TRIED=false
+        elif [ "$CONSOLIDATE_TRIED" = false ]; then
+            echo "  [plan] $PLAN is ${PLAN_LINES} lines (> 
${PLAN_CONSOLIDATE_THRESHOLD}) — one consolidation round"
+            ACTIVE_PROMPT="$LOOP_DIR/PROMPT_consolidate.md"
+            CONSOLIDATE_TRIED=true
+        else
+            echo "  [plan] $PLAN still ${PLAN_LINES} lines after consolidation 
— length is planned work; building"
+        fi
+    fi
+
+    # A genuine build/update iteration (not a consolidate swap-in) carves a
+    # work-item branch off BASE. Plan/consolidate beats — and the consolidate
+    # swap-in — instead stay on the control branch, where the tooling lives.
+    BUILD_ITERATION=false
+    if { [ "$MODE" = "build" ] || [ "$MODE" = "update" ]; } && [ 
"$ACTIVE_PROMPT" = "$PROMPT_FILE" ]; then
+        BUILD_ITERATION=true
+    fi
+
+    # Assemble the prompt BEFORE any checkout, while the working tree is still
+    # on the control branch. Prefer the working-tree copy (local edits count);
+    # fall back to the control branch ($TOOLING_REF) if the tree is on a base
+    # that lacks the tooling. Either way the read never breaks after checkout.
+    PROMPT_WITH_CONTEXT="$(mktemp "${TMPDIR:-/tmp}/spec-loop-prompt.XXXXXX")" 
|| exit 1
+    if [ -f "$ACTIVE_PROMPT" ]; then
+        cat "$ACTIVE_PROMPT" > "$PROMPT_WITH_CONTEXT"
+    elif ! git show "$TOOLING_REF:$ACTIVE_PROMPT" > "$PROMPT_WITH_CONTEXT" 
2>/dev/null; then
+        echo "Error: could not read '$ACTIVE_PROMPT' from the working tree or 
control branch '$TOOLING_REF'." >&2
+        rm -f "$PROMPT_WITH_CONTEXT"; break
+    fi
+    open_pr_context >> "$PROMPT_WITH_CONTEXT"
+
+    if [ "$BUILD_ITERATION" = true ]; then
+        # The work-item branch forks off BASE (e.g. main), which need not
+        # carry the spec-loop tooling. Tell the agent to read the plan and
+        # specs from the control branch, not the working tree.
+        if [ "$TOOLING_REF" != "$BASE" ]; then
+            {
+                echo ""
+                echo "## Tooling source — read the plan and specs from here"
+                echo ""
+                echo "This iteration builds on the integration base \`$BASE\`, 
which does"
+                echo "NOT carry the spec-loop tooling. The plan and specs live 
on the"
+                echo "control branch \`$TOOLING_REF\`. Read them from there 
with \`git show\`,"
+                echo "never from the working tree:"
+                echo ""
+                echo '```'
+                echo "git show 
$TOOLING_REF:tools/spec-loop/IMPLEMENTATION_PLAN.md"
+                echo "git ls-tree -r --name-only $TOOLING_REF 
tools/spec-loop/specs/"
+                echo "git show $TOOLING_REF:tools/spec-loop/specs/<file>"
+                echo '```'
+                echo ""
+                if [ "$MODE" = "update" ]; then
+                    echo "Read the current specs from \`$TOOLING_REF\` 
(commands above) as the"
+                    echo "baseline, then author the updated spec files on this 
work branch —"
+                    echo "the sync PR adds them to \`$BASE\`. Update is the 
one beat that"
+                    echo "writes specs; do that here, not on the control 
branch."
+                else
+                    echo "Implement the product change on the work branch; do 
NOT edit specs"
+                    echo "there — they are not on \`$BASE\`. The control 
branch owns the specs."
+                fi
+            } >> "$PROMPT_WITH_CONTEXT"
+        fi
+
+        # Freshness check: refuse to fork off a stale base. Fetch the base's
+        # tracking branch and verify BASE is not behind it. Forking off a stale
+        # local base is how a loop re-does work that's already merged upstream:
+        # the agent sees "no such file" locally, dutifully re-creates it, and
+        # the resulting branch collides with the merged PR's source branch on
+        # the remote.
+        BASE_UPSTREAM="$(git rev-parse --abbrev-ref "${BASE}@{upstream}" 
2>/dev/null || true)"
+        if [ -n "$BASE_UPSTREAM" ]; then
+            BASE_REMOTE="${BASE_UPSTREAM%%/*}"
+            if ! git fetch --quiet "$BASE_REMOTE" "$BASE" 2>/dev/null; then
+                echo "⚠ Could not fetch '$BASE_REMOTE' — freshness check 
skipped (network or auth issue)." >&2
+            else
+                BEHIND_BY="$(git rev-list --count "${BASE}..${BASE_UPSTREAM}" 
2>/dev/null || echo 0)"
+                if [ "$BEHIND_BY" -gt 0 ]; then
+                    echo "✗ Base '$BASE' is $BEHIND_BY commit(s) behind 
'$BASE_UPSTREAM'." >&2
+                    echo "  Fast-forward before re-running:" >&2
+                    echo "    git checkout $BASE && git merge --ff-only 
$BASE_UPSTREAM" >&2
+                    echo "  Forking off a stale base re-does merged work — the 
new branch" >&2
+                    echo "  may collide with one already on the remote for the 
same change." >&2
+                    rm -f "$PROMPT_WITH_CONTEXT"; break
+                fi
+            fi
+        else
+            echo "⚠ Base '$BASE' has no upstream tracking branch — freshness 
check skipped." >&2
+        fi
+
+        # Check out the base now — right before the agent runs, not earlier —
+        # so the reads above came from the control branch. The agent then
+        # forks its own <slug> branch off this base.
+        if [ "$(current_branch)" != "$BASE" ]; then
+            if ! checkout_out="$(git checkout "$BASE" 2>&1)"; then
+                echo "Error: could not check out base '$BASE'. git reported:" 
>&2
+                printf '  %s\n' "$checkout_out" >&2
+                echo "Resolve the working tree (commit or stash changes), then 
re-run." >&2
+                rm -f "$PROMPT_WITH_CONTEXT"; break
+            fi
+        fi
+        BASE_HEAD="$(git rev-parse HEAD)"
+    fi
+
+    # Run one iteration with a fresh context.
+    #   -p                              headless / non-interactive
+    #   --dangerously-skip-permissions  let the agent edit + validate
+    #                                   unattended. Bypasses the AGENT
+    #                                   permission layer, NOT the OS sandbox
+    #                                   (see the SECURITY header above).
+    #   --disallowedTools …             defense-in-depth: hard-deny push and
+    #                                   gh so a stray call cannot reach the
+    #                                   remote even with permissions skipped.
+    "$AGENT" -p \
+        --dangerously-skip-permissions \
+        --disallowedTools "Bash(git push:*)" "Bash(gh:*)" \
+        --output-format=text \
+        --model "$MODEL" < "$PROMPT_WITH_CONTEXT" &
+    AGENT_PID=$!
+    spinner "$AGENT_PID" & SPINNER_PID=$!
+    wait "$AGENT_PID"
+    wait "$SPINNER_PID" 2>/dev/null
+    rm -f "$PROMPT_WITH_CONTEXT"
+
+    CUR_BRANCH="$(current_branch)"
+    LAST_COMMIT="$(git log --oneline -1 2>/dev/null)"
+    echo "[ branch ] $CUR_BRANCH"
+    [ -n "$LAST_COMMIT" ] && echo "[ commit ] $LAST_COMMIT"
+
+    if [ "$BUILD_ITERATION" = true ]; then
+        # Report the work-item branch the agent produced, by name, so you know
+        # exactly what to push.
+        if [ "$CUR_BRANCH" != "$BASE" ] && [ "$CUR_BRANCH" != "$TOOLING_REF" 
]; then
+            # Branch-name collision check: a remote branch with the same name
+            # often means the agent just re-did work that already shipped under
+            # this slug (a merged PR's source branch typically lingers on the
+            # remote). Warn loudly — pushing would either be rejected or, 
worse,
+            # overwrite the merged history. Check every configured remote, not
+            # just origin: a fork-based workflow has the lineage on `upstream`.
+            COLLISION_FOUND=false
+            for remote in $(git remote); do
+                if git ls-remote --heads --exit-code "$remote" "$CUR_BRANCH" 
>/dev/null 2>&1; then
+                    echo "⚠ Remote '$remote' already has a branch named 
'$CUR_BRANCH'." >&2
+                    COLLISION_FOUND=true
+                fi
+            done
+            if [ "$COLLISION_FOUND" = true ]; then
+                echo "  Likely the source branch of a PR that already shipped 
under this slug." >&2
+                echo "  Inspect before pushing — do not push blind:" >&2
+                echo "    git fetch --all && git log --oneline --all -- 
<changed-file>" >&2
+            fi
+            echo "[ new branch ] $CUR_BRANCH  (forked off $BASE)"
+            echo "               push it with:  git push -u origin $CUR_BRANCH"
+        else
+            echo "⚠ No work-item branch was created (still on '$CUR_BRANCH'). 
Check the agent output above." >&2
+        fi
+
+        # Safety guard: a build/update iteration must never commit to the base.
+        if [ "$CUR_BRANCH" = "$BASE" ] && [ "$(git rev-parse HEAD)" != 
"$BASE_HEAD" ]; then
+            echo "✗ This iteration committed to '$BASE' instead of a work-item 
branch." >&2
+            echo "  Stopping so you can review (expected a new <slug> 
branch)." >&2
+            break
+        fi
+
+        # Return to the control branch so the tooling (plan, prompts, specs) is
+        # present again and you are never stranded on the base. The work-item
+        # branch the agent created persists; this only moves HEAD.
+        if [ "$(current_branch)" != "$TOOLING_REF" ]; then
+            git checkout "$TOOLING_REF" >/dev/null 2>&1 || \
+                echo "⚠ Could not return to control branch '$TOOLING_REF' (now 
on '$(current_branch)')." >&2
+        fi
+    fi
+    echo ""
+done
diff --git a/tools/spec-loop/specs/README.md b/tools/spec-loop/specs/README.md
new file mode 100644
index 0000000..ab811a7
--- /dev/null
+++ b/tools/spec-loop/specs/README.md
@@ -0,0 +1,74 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# Specs
+
+Each file here describes **one functional area of the product** — what it
+does, where it lives in the code, the contract it must honour, and its
+known gaps. They are modelled on the way a game's specs describe its
+battle, magic, and inventory systems: a faithful description of the
+intended behaviour that the build loop reconciles the code against.
+
+Filenames are **topics, not ordered identifiers** — there are no number
+prefixes, because numbering implies a priority the specs don't have.
+Priority lives in [`../IMPLEMENTATION_PLAN.md`](../IMPLEMENTATION_PLAN.md).
+
+## Separate from the RFCs
+
+These specs are **not** RFCs. The RFCs in
+[`../../../docs/rfcs/`](../../../docs/rfcs/) are the normative governance
+layer ("the constitution"); the loop **respects** them as constraints but
+never reads or edits them. Specs are the *functional description of the
+actual codebase*. A spec may cite a principle as a constraint; it never
+restates one and never lives in `docs/rfcs/`.
+
+## The specs
+
+Start with [`overview.md`](overview.md), then:
+
+- Modes: [`triage-mode.md`](triage-mode.md),
+  [`mentoring-mode.md`](mentoring-mode.md),
+  [`drafting-mode.md`](drafting-mode.md),
+  [`pairing-mode.md`](pairing-mode.md).
+- Cross-cutting: [`security-issue-lifecycle.md`](security-issue-lifecycle.md),
+  [`privacy-llm-gate.md`](privacy-llm-gate.md),
+  [`agent-isolation-sandbox.md`](agent-isolation-sandbox.md),
+  [`cve-tooling.md`](cve-tooling.md),
+  [`adoption-and-setup.md`](adoption-and-setup.md),
+  [`adapters.md`](adapters.md),
+  [`meta-and-quality-tooling.md`](meta-and-quality-tooling.md).
+
+(Auto-merge, the fifth MISSION mode, is deliberately off and has no
+spec — see the note in [`overview.md`](overview.md).)
+
+## Spec format
+
+Frontmatter:
+
+```yaml
+---
+title: <functional area>
+status: experimental   # stable | experimental | proposed | off
+kind: feature          # feature | fix | docs | chore
+mode: Triage           # Triage | Mentoring | Drafting | Pairing | infra
+source: <MISSION.md clause / code paths this area is grounded in>
+acceptance:
+  - <verifiable criterion>
+---
+```
+
+Body sections: **What it does**, **Where it lives**, **Behaviour &
+contract**, **Out of scope**, **Acceptance criteria**, **Validation**,
+and (optional) **Known gaps** — the gaps are what the loop's plan pass
+turns into work items.
+
+Status mirrors [`docs/modes.md`](../../../docs/modes.md): `stable`,
+`experimental`, `proposed` (designed in MISSION, no code yet), `off`
+(deliberately not built).
+
+## Adding a spec
+
+1. Name the file for its topic (no number prefix).
+2. Fill in the frontmatter and the body sections above.
+3. Keep **Acceptance criteria** and **Validation** objective — they are
+   the loop's backpressure.
diff --git a/tools/spec-loop/specs/adapters.md 
b/tools/spec-loop/specs/adapters.md
new file mode 100644
index 0000000..37d2b7e
--- /dev/null
+++ b/tools/spec-loop/specs/adapters.md
@@ -0,0 +1,77 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Adapters (Gmail / PonyMail / Jira / GitHub)
+status: experimental
+kind: feature
+mode: infra
+source: >
+  MISSION.md § Rationale ("ASF integrations live behind clean
+  configuration boundaries; non-ASF adopters swap them") and § Technical
+  scope (extensible adapter layer). Implemented in tools/gmail/,
+  tools/ponymail/, tools/jira/, tools/github/.
+acceptance:
+  - Project-specific integrations live behind adapter modules, not
+    hardcoded into skills.
+  - A non-ASF adopter can swap an adapter (e.g. private GitHub repo for
+    private mailing list) with config substitution, not skill edits.
+  - Mail-reading adapters route fetched content through the privacy
+    redactor before any LLM read.
+---
+
+# Adapters (Gmail / PonyMail / Jira / GitHub)
+
+## What it does
+
+Isolates the systems a project already uses behind adapter modules so the
+skills stay project-agnostic. The same skill executes against an ASF
+project's private mailing list or a non-ASF project's private GitHub repo
+by swapping the adapter, not the skill.
+
+## Where it lives
+
+- `tools/gmail/` — private-mail read/draft (drafts to the outbox; never
+  sends).
+- `tools/ponymail/` — public mailing-list archive search.
+- `tools/jira/` — issue-tracker adapter for projects on Jira.
+- `tools/github/` — issues/PRs/labels read + write-back helpers.
+
+## Behaviour & contract
+
+- **Pluggable, config-driven.** Skills reference placeholders
+  (`<tracker>`, `<upstream>`, `<security-list>`, …); the adapter resolves
+  them from `<project-config>/`. No `apache/<project>` strings hardcoded
+  into a skill.
+- **Mail adapters draft, never send** — outbound goes to the maintainer's
+  drafts folder; the human presses Send.
+- **Mail adapters redact-after-fetch** — fetched private content passes
+  through the privacy redactor
+  ([privacy-llm-gate.md](privacy-llm-gate.md)) before any LLM read.
+- **Write-back is confirm-before-apply** and routed through the sandbox's
+  `ask` gate ([agent-isolation-sandbox.md](agent-isolation-sandbox.md)).
+
+## Out of scope
+
+- The privacy *policy* and gate (separate area, referenced above).
+- Sending outbound mail (a human action).
+
+## Acceptance criteria
+
+1. No skill hardcodes a project-specific repo/list; all go through an
+   adapter + placeholder.
+2. Mail adapters draft only and redact before LLM read.
+3. Each adapter ships with its own tests.
+
+## Validation
+
+```bash
+for t in gmail ponymail jira github; do
+  uv run --project tools/$t --group dev pytest || echo "check tools/$t test 
setup"
+done
+```
+
+## Known gaps
+
+- `experimental` overall — adapter coverage varies; a new adopter system
+  (e.g. GitLab, a different mail backend) is a gap the plan pass records.
diff --git a/tools/spec-loop/specs/adoption-and-setup.md 
b/tools/spec-loop/specs/adoption-and-setup.md
new file mode 100644
index 0000000..7bb0510
--- /dev/null
+++ b/tools/spec-loop/specs/adoption-and-setup.md
@@ -0,0 +1,77 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Adoption & setup
+status: stable
+kind: feature
+mode: infra
+source: >
+  README.md § How adoption works / Adopting the framework / Maintenance.
+  Implemented by the setup family (setup-steward and siblings) and the
+  snapshot + agentic-override model.
+acceptance:
+  - An adopter commits exactly one skill (setup-steward); everything else
+    is a gitignored snapshot plus committed override + lock files.
+  - The committed lock pins install method + URL + ref so a fresh clone
+    re-installs the same framework version.
+  - Drift between the committed pin and the local install is detected and
+    surfaced with an upgrade proposal.
+---
+
+# Adoption & setup
+
+## What it does
+
+Gets the framework into an adopter repo and keeps it current using a
+**snapshot + agentic-override** model: one committed bootstrap skill, a
+gitignored framework snapshot (a build artefact, never committed),
+gitignored skill symlinks, and committed agent-readable override files.
+
+## Where it lives
+
+- Skill: `setup-steward` (adopt, verify, upgrade, override).
+- Skills: `setup-isolated-setup-install` / `-update` / `-verify`
+  (the sandbox harness), `setup-override-upstream` (promote a stabilised
+  override into a framework PR), `setup-shared-config-sync`.
+- Docs: `docs/setup/` (install recipes, agentic-overrides contract,
+  prerequisites).
+- Lock files: `.apache-steward.lock` (committed pin) and
+  `.apache-steward.local.lock` (gitignored, what this machine fetched).
+
+## Behaviour & contract
+
+- **One committed skill, no submodules, no vendored framework copies.**
+  The snapshot lives in a gitignored `.apache-steward/`.
+- **Committed lock is the source of truth.** A fresh contributor runs
+  `/setup-steward` and re-installs to the project's pinned version.
+- **Drift detection** at the top of every framework skill: if the
+  gitignored local lock has drifted from the committed pin, the skill
+  proposes `/setup-steward upgrade`.
+- **Overrides are agent-readable Markdown** under
+  `.apache-steward-overrides/`, consulted at runtime and merged before
+  default behaviour ([pairing/correctability is the model]).
+
+## Out of scope
+
+- The runtime behaviour of the modes themselves.
+- Editing the adopter's `.claude/settings.json` beyond what the install
+  recipe declares.
+
+## Acceptance criteria
+
+1. Adoption commits only the bootstrap skill + lock/override scaffold.
+2. The committed lock re-installs the same version on a fresh clone.
+3. Drift between local and committed locks is surfaced with an upgrade.
+
+## Validation
+
+```bash
+test -f docs/setup/README.md
+uv run --project tools/skill-validator --group dev skill-validate
+```
+
+## Known gaps
+
+- `stable`; gaps appear as new adopter skill-directory layouts to support
+  or new override surfaces — recorded by the plan pass.
diff --git a/tools/spec-loop/specs/agent-isolation-sandbox.md 
b/tools/spec-loop/specs/agent-isolation-sandbox.md
new file mode 100644
index 0000000..c668053
--- /dev/null
+++ b/tools/spec-loop/specs/agent-isolation-sandbox.md
@@ -0,0 +1,84 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Agent isolation / layered sandbox
+status: stable
+kind: feature
+mode: infra
+source: >
+  MISSION.md § Privacy, security and supply-chain integrity ("Clean-
+  environment wrapper", "Layered sandbox by default", "Pinned, reviewed,
+  signed dependencies"). Implemented in tools/agent-isolation/, the
+  setup-isolated-setup-* skills, and .claude/settings.json.
+acceptance:
+  - Every agent subprocess runs inside an OS-level sandbox with default-
+    deny filesystem reads and network egress.
+  - Credential-shaped env vars are stripped before the agent execs.
+  - State-mutating shell calls (git push, gh pr create, …) require a
+    confirmation prompt; secrets/cred files are deny-read.
+---
+
+# Agent isolation / layered sandbox
+
+## What it does
+
+Runs every agent invocation inside a layered sandbox so that even a
+successful prompt injection cannot read credentials or reach a
+non-allowed host. The fallback when prompt engineering fails is the OS
+saying "no".
+
+## Where it lives
+
+- `tools/agent-isolation/` — the harness (clean-env wrapper +
+  sandbox profiles).
+- `.claude/settings.json` — the `sandbox` block (filesystem
+  allow/deny, network `allowedDomains`) and `permissions` (`deny` /
+  `ask`).
+- Skills: `setup-isolated-setup-install`, `-update`, `-verify`.
+- `docs/setup/secure-agent-internals.md` — the three-layer model.
+
+## Behaviour & contract
+
+The reference model is four layers, layered:
+
+1. **Clean environment** — a wrapper strips the process env to a
+   project-declared whitelist before exec (no `$GH_TOKEN`, `$AWS_*`,
+   `$ANTHROPIC_API_KEY` leakage).
+2. **Filesystem + network sandbox** — Linux `bubblewrap` + `socat` SNI
+   proxy; macOS `sandbox-exec`. Default-deny reads outside the tree and
+   egress to non-allowed hosts.
+3. **Tool permissions** — the host's `permissions.deny` blocks denied
+   paths/binaries (`Read(~/.ssh/**)`, `Bash(curl *)`, …).
+4. **Forced confirmation** — `permissions.ask` on every state-mutating
+   shell call (`git push`, `gh pr create`, `gh issue edit`, …).
+
+Pinned system tools (`bubblewrap`, `socat`, agent CLI) are aged through a
+cooldown window; bumps are PRs, not silent updates.
+
+## Out of scope
+
+- The human-in-the-loop *confirmation* itself (that is the modes' job);
+  this area provides the OS-level enforcement underneath it.
+- Editing `.claude/settings.json` (it is in the `deny` list).
+
+## Acceptance criteria
+
+1. Filesystem and network default-deny with explicit allow-lists.
+2. The clean-env wrapper strips credential-shaped vars before exec.
+3. `git push` / `gh pr create` are in `permissions.ask`; secret/cred
+   files are in `permissions.deny`.
+
+## Validation
+
+```bash
+uv run --project tools/agent-isolation --group dev pytest
+python3 -c "import json,sys; s=json.load(open('.claude/settings.json')); \
+  asks=' '.join(s['permissions']['ask']); \
+  sys.exit(0 if 'git push' in asks and 'gh pr create *' in asks else 1)"
+```
+
+## Known gaps
+
+- `stable`; drift shows up when a new state-mutating command is added to a
+  skill without a matching `ask` rule — the plan pass flags it.
diff --git a/tools/spec-loop/specs/cve-tooling.md 
b/tools/spec-loop/specs/cve-tooling.md
new file mode 100644
index 0000000..0f0af33
--- /dev/null
+++ b/tools/spec-loop/specs/cve-tooling.md
@@ -0,0 +1,74 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: CVE tooling
+status: stable
+kind: feature
+mode: infra
+source: >
+  README.md § Skill families (security) and AGENTS.md § Linking CVEs.
+  Implemented in tools/vulnogram/generate-cve-json/, tools/cve-org/, and
+  the security-cve-allocate skill.
+acceptance:
+  - A deterministic command produces a paste-ready CVE 5.x JSON record
+    from a tracking issue's template fields.
+  - Tracker-repo URLs and the CVE-tool OAuth URL are filtered out of the
+    record's references before serialisation.
+  - The skill regenerates the attached JSON so it stays in lock-step with
+    the issue body.
+---
+
+# CVE tooling
+
+## What it does
+
+Turns a tracking issue into a CVE allocation and a paste-ready CVE 5.x
+JSON record — the machine half of the security lifecycle's allocation
+step. Deterministic (no model calls) so the output is reproducible and
+reviewable.
+
+## Where it lives
+
+- `tools/vulnogram/generate-cve-json/` — a `uv run` script that parses an
+  issue's template fields (multiple credits, multiple reference URLs,
+  `>= X, < Y` version ranges) and emits `containers.cna` JSON matching
+  Vulnogram's export shape, plus the Vulnogram `#json` paste URL.
+- `tools/cve-org/` — CVE.org / CVE-services helpers.
+- Skill: `security-cve-allocate` — walks the (PMC-gated) allocation form,
+  then updates the tracker and regenerates the attached JSON via
+  `generate-cve-json --attach`.
+
+## Behaviour & contract
+
+- **Deterministic and reviewable.** The JSON is generated by a Python
+  script, not an LLM; the same issue yields the same record.
+- **Reference hygiene.** The project's CVE-tool URL
+  (`cveprocess.apache.org/...`) and any tracker-repo URLs are filtered
+  out of `references[]` before serialising — they are OAuth-gated / private.
+- **Allocation is PMC-gated.** The skill submits nothing itself when the
+  user is not on the PMC; it reshapes the recipe into a relay message.
+- **Lock-step.** The attached JSON is regenerated whenever the tracker
+  body changes, so the two never drift.
+
+## Out of scope
+
+- Deciding whether a report warrants a CVE (that is triage/assessment).
+- Publishing the CVE (a PMC action outside the framework).
+
+## Acceptance criteria
+
+1. `generate-cve-json` emits valid CVE 5.x JSON from a tracking issue.
+2. Tracker + CVE-tool URLs are absent from `references[]`.
+3. `--attach` refreshes the JSON on the tracking issue.
+
+## Validation
+
+```bash
+uv run --project tools/vulnogram/generate-cve-json --group dev pytest
+```
+
+## Known gaps
+
+- `stable`; drift appears if the CVE 5.x schema or Vulnogram export shape
+  changes upstream — caught by the tool's own tests.
diff --git a/tools/spec-loop/specs/drafting-mode.md 
b/tools/spec-loop/specs/drafting-mode.md
new file mode 100644
index 0000000..8c18389
--- /dev/null
+++ b/tools/spec-loop/specs/drafting-mode.md
@@ -0,0 +1,73 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Drafting mode
+status: experimental
+kind: feature
+mode: Drafting
+source: >
+  MISSION.md § Technical scope (Drafting). docs/modes.md § Drafting.
+  Implemented by security-issue-fix (stable, security-only) and
+  issue-fix-workflow (experimental).
+acceptance:
+  - A drafting skill produces the failing test, the smallest production
+    change, targeted test runs, and a commit — but never merges.
+  - The PR is opened via `gh pr create --web` (human reviews in browser)
+    or handed back for the human to push; no autopilot push/merge.
+  - Security-class drafts scrub CVE / tracker-slug / "security fix" /
+    "vulnerability" from every public surface until the embargo lifts.
+---
+
+# Drafting mode
+
+## What it does
+
+The agent drafts a fix for a well-scoped problem — a triaged issue, a
+CVE-allocated security report with team consensus on scope, a failing
+test with an obvious cause, a documentation hole — and prepares a PR.
+Every PR is reviewed and merged by a human committer; the agent never
+merges its own work.
+
+## Where it lives
+
+- `security-issue-fix` (stable, security-only) — drafts the fix in the
+  user's local `<upstream>` clone, runs local checks, opens the public
+  PR via `gh pr create --web`, scrubs confidential framing.
+- `issue-fix-workflow` (experimental) — drafts a fix for a triaged
+  general-issue; **does not** open the PR on autopilot, hands back a
+  branch + commits + test results for the human to push.
+- `tools/dev` — shared local-check helpers.
+
+## Behaviour & contract
+
+- **Draft, never merge.** No skill in this mode merges. Opening the PR is
+  `gh pr create --web` (human confirms in the browser) or a hand-back.
+- **Confidentiality scrub** (security): commit message, branch name, PR
+  title/body, newsfragment are scrubbed for CVE IDs, the tracker repo
+  slug, and the words "security fix" / "vulnerability" before any write
+  or push (see `AGENTS.md` § Confidentiality).
+- **Commit trailer** `Generated-by:` — never `Co-Authored-By:` an agent.
+
+## Out of scope
+
+- Generic Drafting beyond security + general-issue (lint fixes, audit-
+  tool findings, doc holes at scale) — `proposed`, not yet built.
+- Merging, releasing, or pushing without a human.
+
+## Acceptance criteria
+
+1. No drafting skill merges or force-pushes.
+2. Security drafts pass the confidentiality scrub before any public write.
+3. `skill-validate` passes on the drafting-family skills.
+
+## Validation
+
+```bash
+uv run --project tools/skill-validator --group dev skill-validate
+```
+
+## Known gaps
+
+- Generic (non-security, non-issue) Drafting from audit-tool findings is
+  `proposed`. Only `security-issue-fix` is stable today.
diff --git a/tools/spec-loop/specs/mentoring-mode.md 
b/tools/spec-loop/specs/mentoring-mode.md
new file mode 100644
index 0000000..abc6ada
--- /dev/null
+++ b/tools/spec-loop/specs/mentoring-mode.md
@@ -0,0 +1,75 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Mentoring mode
+status: proposed
+kind: feature
+mode: Mentoring
+source: >
+  MISSION.md § Technical scope (Mentoring) — "the highest-value
+  project-side mode and the one off-the-shelf agent tooling skips".
+  docs/modes.md § Mentoring (proposed, 0 skills). Spec exists at
+  docs/mentoring/spec.md ahead of any skill code.
+acceptance:
+  - The Mentoring spec (tone guide, hand-off protocol, adopter knobs) is
+    reviewable independently of any runtime skill (it already is).
+  - The first skill ships flagged mode Mentoring + experimental and joins
+    threads in a teaching register, never gatekeeps.
+  - Hand-off to a human is explicit when scope exceeds the agent.
+---
+
+# Mentoring mode
+
+## What it does
+
+Joins issue and PR threads in a deliberately teaching register:
+clarifying questions, pointers to project conventions and docs, an
+explanation of *why* a change is being asked for, paired examples from
+similar prior PRs, and a clean hand-off to a human reviewer when the
+question exceeds what an agent should answer. MISSION names this the
+contributor-empowerment lever the wider ecosystem most needs.
+
+## Where it lives
+
+- Spec ahead of code: `docs/mentoring/README.md`,
+  `docs/mentoring/spec.md`.
+- Adopter config scaffold: `projects/_template/mentoring-config.md`.
+- First skill (planned): `pr-management-mentor` (working name), ships
+  `mode: Mentoring` + `experimental`.
+
+## Behaviour & contract
+
+- **Teaching register, never gatekeeping.** The most sensitive surface
+  in the project (MISSION § Particular care): a condescending agent that
+  drives a contributor away is not patchable. Tone is the project's to
+  set (`mentoring-config.md`).
+- Read-only / drafts replies for human review; never closes or rejects a
+  contributor's work on its own.
+- Explicit hand-off protocol when the question is out of the agent's
+  depth.
+
+## Out of scope
+
+- Implementation-detail review that belongs to Pairing
+  ([Pairing](pairing-mode.md)).
+- Any contributor-facing message sent without human review.
+
+## Acceptance criteria
+
+1. The Mentoring spec is reviewable without any skill code (it is).
+2. The first Mentoring skill validates and carries `mode: Mentoring`.
+3. Hand-off-to-human is documented and enforced.
+
+## Validation
+
+```bash
+test -f docs/mentoring/spec.md
+uv run --project tools/skill-validator --group dev skill-validate
+```
+
+## Known gaps
+
+- **No skill yet** — this is a pure `proposed` gap. The spec and tone
+  guide exist; the prototype skill is the first work item the loop would
+  pick up under this area.
diff --git a/tools/spec-loop/specs/meta-and-quality-tooling.md 
b/tools/spec-loop/specs/meta-and-quality-tooling.md
new file mode 100644
index 0000000..f12f9ce
--- /dev/null
+++ b/tools/spec-loop/specs/meta-and-quality-tooling.md
@@ -0,0 +1,81 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Meta & quality tooling
+status: stable
+kind: feature
+mode: infra
+source: >
+  README.md § Skill families (utilities) and AGENTS.md § Reusable skills.
+  Implemented by tools/skill-validator/, tools/skill-evals/,
+  tools/sandbox-lint/, tools/dashboard-generator/, tools/probe-templates/,
+  and the write-skill / list-steward-skills utility skills.
+acceptance:
+  - Skill definitions are validated (frontmatter keys name/description/
+    license, internal link integrity, placeholder conventions).
+  - There is a live, generated index of available skills (no cached copy).
+  - Skill behaviour can be measured by an eval harness.
+  - Every skill ships a behavioural eval suite under
+    tools/skill-evals/evals/<skill-name>/ (per /AGENTS.md § Reusable
+    skills).
+---
+
+# Meta & quality tooling
+
+## What it does
+
+The framework's tooling for authoring, validating, indexing, and
+evaluating its own skills — the quality gate that keeps the catalogue
+trustworthy as it grows.
+
+## Where it lives
+
+- `tools/skill-validator/` — validates `SKILL.md` frontmatter (required
+  `name`, `description`, `license`), internal link integrity, and
+  placeholder conventions. CLI: `skill-validate`.
+- `tools/skill-evals/` — harness for measuring skill behaviour.
+- `tools/sandbox-lint/` — lints the sandbox/permissions configuration.
+- `tools/dashboard-generator/` — read-only HTML dashboards over campaign
+  artefacts.
+- `tools/probe-templates/` — reusable probes.
+- Skills: `write-skill` (author/update a skill), `list-steward-skills`
+  (live, generated index of every skill, grouped by family).
+
+## Behaviour & contract
+
+- **Generated, never cached.** `list-steward-skills` reads the live
+  `.claude/skills/*/SKILL.md` frontmatter on every run, so the index never
+  goes stale.
+- **Deterministic checks.** `skill-validator` and `sandbox-lint` are
+  heuristic/text tools with no model calls — reproducible in CI.
+- **Hard vs soft rules.** The validator fails on missing frontmatter or
+  broken links; advisories are warnings unless `--strict`.
+
+## Out of scope
+
+- The maintainership modes themselves.
+- The spec-loop tooling, which lives in `tools/spec-loop/` and is
+  documented in [`../README.md`](../README.md), not here.
+
+## Acceptance criteria
+
+1. `skill-validate` enforces required frontmatter + link integrity.
+2. `list-steward-skills` generates its index from live frontmatter.
+3. Each meta tool ships with its own tests.
+
+## Validation
+
+```bash
+uv run --project tools/skill-validator --group dev pytest
+uv run --project tools/skill-validator --group dev skill-validate
+```
+
+## Known gaps
+
+- **Eval coverage is incomplete** — the harness has ~15 suites but the
+  repo has more skills than that; skills added before the per-skill-eval
+  convention have no suite. Back-filling one suite per uncovered skill is
+  a tracked work item.
+- Other gaps appear as new quality checks worth adding (e.g. a spec
+  validator analogous to the skill validator) — recorded by the plan pass.
diff --git a/tools/spec-loop/specs/overview.md 
b/tools/spec-loop/specs/overview.md
new file mode 100644
index 0000000..0022df8
--- /dev/null
+++ b/tools/spec-loop/specs/overview.md
@@ -0,0 +1,68 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# Product overview
+
+Apache Steward (working name; rename in flight) is a **project-agnostic
+framework for agent-assisted repository maintainership and development**.
+It is not an application with a `src/` tree — it is a substrate plus a
+catalogue of agent-readable **skills** (Markdown + YAML) and deterministic
+**tools** (`uv`/Python). The specs in this directory describe that
+functionality so the build loop can keep the code in sync with the
+intended behaviour.
+
+These specs are **unordered** — the filename is a topic, not a priority.
+What to build next comes from 
[`../IMPLEMENTATION_PLAN.md`](../IMPLEMENTATION_PLAN.md),
+not from any numbering.
+
+## Substrate
+
+Issue/PR ingestion, GitHub write-back, mail threading, audit logging, and
+pluggable adapters for the systems a project already uses (Gmail,
+PonyMail, Jira, GitHub, Vulnogram). Per-project configuration declares
+which modes are on, eligible change classes, reviewers, and where reports
+come from.
+
+## The modes (MISSION taxonomy)
+
+Each mode is an independently toggleable set of skills. Maturity mirrors
+[`docs/modes.md`](../../../docs/modes.md):
+
+| Mode | Spec | Maturity |
+|---|---|---|
+| Triage | [triage-mode.md](triage-mode.md) | stable (security) / experimental 
(PR, issue) |
+| Mentoring | [mentoring-mode.md](mentoring-mode.md) | proposed (0 skills) |
+| Drafting | [drafting-mode.md](drafting-mode.md) | stable (security) / 
experimental (issue) |
+| Pairing | [pairing-mode.md](pairing-mode.md) | proposed (0 skills) |
+
+> **Auto-merge** is the fifth MISSION mode but is deliberately **off** by
+> sequencing policy (`.asf.yaml` `allow_auto_merge: false`) — it has no
+> implementation and nothing to build, so it is documented as a boundary
+> in [`docs/modes.md`](../../../docs/modes.md), not as a spec here.
+
+## Cross-cutting functionality
+
+| Area | Spec |
+|---|---|
+| Security-issue lifecycle (the load-bearing use case) | 
[security-issue-lifecycle.md](security-issue-lifecycle.md) |
+| Privacy-LLM gate + PII redaction | 
[privacy-llm-gate.md](privacy-llm-gate.md) |
+| Agent isolation / layered sandbox | 
[agent-isolation-sandbox.md](agent-isolation-sandbox.md) |
+| CVE tooling | [cve-tooling.md](cve-tooling.md) |
+| Adoption & setup | [adoption-and-setup.md](adoption-and-setup.md) |
+| Adapters (Gmail / PonyMail / Jira / GitHub) | [adapters.md](adapters.md) |
+| Meta & quality tooling | 
[meta-and-quality-tooling.md](meta-and-quality-tooling.md) |
+
+## The non-negotiables every area inherits
+
+- **Human-in-the-loop on every state change.** Outputs are proposals;
+  the human confirms. (`docs/rfcs/` holds the normative statement; this
+  spec system respects it as a constraint and never edits it.)
+- **Drafts, never sends; the human presses the button** — no skill calls
+  `git push` / `gh pr create` on autopilot.
+- **Secure sandbox by default**, vendor neutrality, privacy by design.
+
+## How these specs are built
+
+The build loop (see [`../README.md`](../README.md)) compares each spec
+against the code and turns the gaps into work items in the plan — one
+work item, one branch, one PR.
diff --git a/tools/spec-loop/specs/pairing-mode.md 
b/tools/spec-loop/specs/pairing-mode.md
new file mode 100644
index 0000000..2594d23
--- /dev/null
+++ b/tools/spec-loop/specs/pairing-mode.md
@@ -0,0 +1,72 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Pairing mode
+status: proposed
+kind: feature
+mode: Pairing
+source: >
+  MISSION.md § Technical scope (Pairing) and § Initial Goals ("Ship at
+  least one Pairing skill family in v1"). docs/modes.md § Pairing
+  (proposed, 0 skills).
+acceptance:
+  - At least one Pairing skill exists and validates (v1 goal).
+  - Pairing skills run in the developer's OWN dev loop and make no state
+    change on behalf of the project (read-only / hand-back).
+  - Mentorship is intrinsic: the agent handles implementation-detail
+    review so the human conversation stays on design and reasoning.
+---
+
+# Pairing mode
+
+## What it does
+
+The developer-side counterpart to the project-side modes. Pairing skills
+run in a maintainer's or contributor's *own* dev loop: multi-agent review
+pipelines, self-review and pre-flight patterns, and scoped fix drafting
+under the developer's driver's seat. Mentorship is intrinsic — the agent
+absorbs mechanical implementation-detail review so the human-to-human
+conversation stays on design and the trade-offs the project cares about,
+protecting the ASF contribution path (contributor → committer → PMC).
+
+## Where it lives
+
+- Currently nowhere — the family is empty (`docs/modes.md`: 0 skills).
+- Planned skills: a pre-flight **self-review** skill (highest priority)
+  and a **multi-agent review** pipeline — both tracked as work items in
+  [`../IMPLEMENTATION_PLAN.md`](../IMPLEMENTATION_PLAN.md).
+
+## Behaviour & contract
+
+- **No state change on the project's behalf.** Pairing skills are the
+  developer's toolkit; they end at a report or a local branch.
+- Same skill format and sandbox/privacy posture as the project-side modes.
+- **Ships before Auto-merge** in the roadmap (MISSION sequencing): Pairing
+  must establish that human reasoning, not implementation chatter, is the
+  load-bearing part of the workflow before any auto-merge is considered.
+
+## Out of scope
+
+- Acting on issues/PRs/threads on the project's behalf (that is
+  Triage/Mentoring/Drafting).
+- Auto-merge — deliberately off by MISSION sequencing; not built.
+
+## Acceptance criteria
+
+1. ≥1 Pairing skill exists, validates, and is read-only/hand-back.
+2. `docs/modes.md` Pairing row reflects the shipped count and status.
+
+## Validation
+
+```bash
+ls .claude/skills/ | grep -q '^pairing-' && echo "pairing skill present" || 
echo "GAP: no pairing skill"
+uv run --project tools/skill-validator --group dev skill-validate
+```
+
+## Known gaps
+
+- **Empty family** — the largest functional gap against MISSION's v1
+  goals. Work items (in 
[`../IMPLEMENTATION_PLAN.md`](../IMPLEMENTATION_PLAN.md)):
+  a pre-flight self-review skill (first, highest priority), then a
+  multi-agent review pipeline.
diff --git a/tools/spec-loop/specs/privacy-llm-gate.md 
b/tools/spec-loop/specs/privacy-llm-gate.md
new file mode 100644
index 0000000..35a9436
--- /dev/null
+++ b/tools/spec-loop/specs/privacy-llm-gate.md
@@ -0,0 +1,84 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Privacy-LLM gate + PII redaction
+status: stable
+kind: feature
+mode: infra
+source: >
+  MISSION.md § Privacy, security and supply-chain integrity
+  ("Privacy-aware LLM routing", "PII redaction at the boundary").
+  Implemented in tools/privacy-llm/ (checker, redactor, wiring.md, pii.md)
+  and documented in docs/setup/privacy-llm.md.
+acceptance:
+  - Private content reaches only LLMs the project's PMC has approved; the
+    gate refuses to route private bytes through a non-approved model.
+  - PII is redacted to stable hashed identifiers before any LLM read; the
+    reverse map stays on the maintainer's local disk (mode 0600).
+  - Every public artefact is scrubbed for private-content leakage before
+    it leaves the framework's control.
+---
+
+# Privacy-LLM gate + PII redaction
+
+## What it does
+
+Treats private content — security reports, embargoed CVE detail,
+PMC-private mail, contributor PII — as chain-of-custody material. Three
+mechanisms: an **approved-LLM gate** that verifies the model-of-the-moment
+is on the PMC's allow-list before any private read; a **redactor** that
+maps reporter names/emails/IPs to stable hashed identifiers before the
+LLM sees them; and a **confidentiality scrub** that checks every public
+artefact for leakage before emission.
+
+## Where it lives
+
+- `tools/privacy-llm/checker/` — the approved-LLM gate.
+- `tools/privacy-llm/redactor/` — the PII redactor (name→`N-<hash>`,
+  email→`E-<hash>`, IP→`IP-<hash>`).
+- `tools/privacy-llm/wiring.md` — the redact-after-fetch protocol every
+  Gmail/PonyMail-reading skill follows.
+- `tools/privacy-llm/pii.md` — the PII pattern catalogue.
+- `docs/setup/privacy-llm.md` — adopter-facing setup.
+
+## Behaviour & contract
+
+- **Policy is the PMC's; the gate is the framework's.** The approved-LLM
+  list is per-PMC. The default: agent host trusted, `*.apache.org`
+  auto-approved, `localhost` for local inference, everything else opt-in.
+- **Redact before read; reveal locally.** Skills operate on hashed
+  identifiers; the reverse map never goes to an LLM and is never committed.
+- **Reporter credit is preserved** (CVE `credits[]`) only after the
+  reporter confirms on the inbound thread — credit is a deliberate output,
+  not in-context PII.
+- **Confidentiality scrub before public emission** — regex for CVE IDs in
+  pre-disclosure PRs, reporter names from the local map, list addresses,
+  and any project-declared private string; failures stop the flow.
+- **Audit log is privacy-aware** — references hashed identifiers, never
+  raw PII.
+
+## Out of scope
+
+- Setting the approved-LLM *policy* (that is the PMC's, per adopter).
+- Enforcing OS-level isolation — that is the sandbox
+  ([agent-isolation-sandbox.md](agent-isolation-sandbox.md)).
+
+## Acceptance criteria
+
+1. The gate blocks a private read when the active model is not approved.
+2. The redactor produces stable hashed identifiers and keeps the reverse
+   map local (0600, gitignored).
+3. The scrub catches CVE IDs / reporter names / list addresses before any
+   public write.
+
+## Validation
+
+```bash
+uv run --project tools/privacy-llm --group dev pytest
+```
+
+## Known gaps
+
+- `stable`; gaps surface as new PII patterns or new public-emission
+  surfaces not yet covered by the scrub — caught as drift by the plan pass.
diff --git a/tools/spec-loop/specs/security-issue-lifecycle.md 
b/tools/spec-loop/specs/security-issue-lifecycle.md
new file mode 100644
index 0000000..ee13186
--- /dev/null
+++ b/tools/spec-loop/specs/security-issue-lifecycle.md
@@ -0,0 +1,76 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Security-issue lifecycle (end-to-end)
+status: stable
+kind: feature
+mode: Triage
+source: >
+  MISSION.md § Rationale ("Security-issue handling is a load-bearing use
+  case"). README.md § Skill families (security). The security skill
+  family + tools/vulnogram + tools/gmail + tools/ponymail +
+  tools/privacy-llm.
+acceptance:
+  - The flow runs import → triage → dedupe → CVE allocate → fix → sync →
+    invalidate, each step a confirm-before-apply skill.
+  - Tracker contents stay private; only stable identifiers (URLs, #NNN)
+    are public-safe. Public PRs are scrubbed pre-disclosure.
+  - Every applied state change is audit-logged.
+---
+
+# Security-issue lifecycle
+
+## What it does
+
+The framework's flagship, highest-procedure flow: handle an inbound ASF
+security report end-to-end, from `security@` import through CVE
+publication, with a human gate and an audit-log entry at every step.
+
+## Where it lives
+
+- Skills: `security-issue-import` (+ `-from-pr`, `-from-md`),
+  `security-issue-triage`, `security-issue-deduplicate`,
+  `security-cve-allocate`, `security-issue-fix`, `security-issue-sync`,
+  `security-issue-invalidate`.
+- Tools: `tools/vulnogram/generate-cve-json` (CVE 5.x JSON),
+  `tools/cve-org`, `tools/gmail` + `tools/ponymail` (mail), and the
+  `tools/privacy-llm` gate/redactor ([the privacy gate](privacy-llm-gate.md)).
+
+## Behaviour & contract
+
+- **Confirm-before-apply at every step.** Imports create trackers only
+  after confirmation; triage posts proposal comments and never decides;
+  allocation is PMC-gated; fixes draft PRs the human opens.
+- **Confidentiality (three layers, see `AGENTS.md`):** tracker URLs and
+  `#NNN` are public-safe identifiers; tracker *contents* are private;
+  the security framing of a public PR is embargoed until the advisory
+  ships.
+- **Reporter PII redacted in-context; reporter *credit* preserved** in
+  the CVE `credits[]` only after the reporter confirms on the thread.
+- **Audit log** of every applied change (redacted identifiers only).
+
+## Out of scope
+
+- Drafting beyond the security case (see [Drafting](drafting-mode.md)).
+- Sending mail — replies are drafted to the maintainer's outbox.
+
+## Acceptance criteria
+
+1. Each lifecycle skill is confirm-before-apply and audit-logs applied
+   changes.
+2. No public surface produced by the flow contains tracker contents or
+   pre-disclosure security framing.
+3. CVE JSON is regenerated to stay in lock-step with the tracker body.
+
+## Validation
+
+```bash
+uv run --project tools/skill-validator --group dev skill-validate
+uv run --project tools/vulnogram/generate-cve-json --group dev pytest
+```
+
+## Known gaps
+
+- The flow is `stable`; gaps surface as drift between a skill's documented
+  steps and the adapters it calls — the loop's plan pass catches those.
diff --git a/tools/spec-loop/specs/triage-mode.md 
b/tools/spec-loop/specs/triage-mode.md
new file mode 100644
index 0000000..a264e0a
--- /dev/null
+++ b/tools/spec-loop/specs/triage-mode.md
@@ -0,0 +1,74 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Triage mode
+status: experimental
+kind: feature
+mode: Triage
+source: >
+  MISSION.md § Technical scope (Triage). docs/modes.md § Triage.
+  Implemented by the pr-management, issue, and security skill families.
+acceptance:
+  - Every triage skill is read-only on tracker state or proposes-then-
+    confirms; none transitions, closes, or labels without confirmation.
+  - Classifications are grounded in prior triaged cases / the project's
+    Security Model, not invented categories.
+  - Security-side import/dedupe/sync/invalidate/allocate skills are
+    stable; PR and general-issue triage are experimental.
+---
+
+# Triage mode
+
+## What it does
+
+The lowest-risk, foundational mode: spot inbound issues / security
+reports / PRs, classify them, surface likely duplicates, link related
+discussions, and propose routing to the right human. Every output is a
+suggestion the human signs off on.
+
+## Where it lives
+
+- PR queue: `pr-management-triage`, `pr-management-stats`,
+  `pr-management-code-review` (deep review is a triage variant).
+- General issues: `issue-triage`, `issue-reassess`, `issue-reproducer`.
+- Security inbound: `security-issue-import`, `-import-from-pr`,
+  `-import-from-md`, `security-issue-deduplicate`,
+  `security-issue-invalidate`, `security-issue-sync`,
+  `security-cve-allocate`.
+- Adapters it reads through: `tools/github`, `tools/jira`,
+  `tools/ponymail`, `tools/gmail`.
+
+## Behaviour & contract
+
+- **Read-only or propose-then-confirm.** `issue-triage` and
+  `security-issue-triage` post a *proposal comment* on confirmation and
+  never flip labels, close, or allocate. Reproducers produce evidence
+  (`verdict.json`), never post.
+- Six-class disposition vocabulary on the security side
+  (`VALID` / `DEFENSE-IN-DEPTH` / `INFO-ONLY` / `INVALID` /
+  `PROBABLE-DUP` / `FIX-ALREADY-PUBLIC`).
+- Duplicate detection keys on stable identifiers (Gmail `threadId`,
+  GHSA-ID), not on fuzzy body text alone.
+
+## Out of scope
+
+- Authoring fixes (that is Drafting, [Drafting](drafting-mode.md)).
+- Any state change a human has not confirmed in-session.
+
+## Acceptance criteria
+
+1. No triage skill performs an unconfirmed state change.
+2. `skill-validate` passes on all triage-family skills.
+3. docs/modes.md Triage table matches the shipped skill set.
+
+## Validation
+
+```bash
+uv run --project tools/skill-validator --group dev skill-validate
+```
+
+## Known gaps
+
+- PR and general-issue triage are `experimental` — no adopter-pilot eval
+  has run; behaviour may change.

(airflow-steward) branch main updated: feat(spec-loop): freshness check + branch-name collision check (#297)

Reply via email to