This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git
The following commit(s) were added to refs/heads/main by this push:
new 7c99447 docs(spec): add reviewer-routing and skill-reconciler specs;
prune completed plan items (#586)
7c99447 is described below
commit 7c9944770f92248a44295f42bb97fa882a20712c
Author: Justin Mclean <[email protected]>
AuthorDate: Sat Jun 27 20:15:04 2026 +1000
docs(spec): add reviewer-routing and skill-reconciler specs; prune
completed plan items (#586)
Two proposed-status functional specs (reviewer-routing, Triage; and
skill-reconciler, infra) plus their index entries in overview.md and
specs/README.md. In IMPLEMENTATION_PLAN, remove the merged in-flight batch
and the six shipped planned items (recorded under What's been built), fix
stale references, and list the current spec work items. No eval/local-smoke
content (that lives on the local-smoke branches).
---
tools/spec-loop/IMPLEMENTATION_PLAN.md | 218 +++++++++++-------------------
tools/spec-loop/specs/README.md | 4 +-
tools/spec-loop/specs/overview.md | 2 +
tools/spec-loop/specs/reviewer-routing.md | 123 +++++++++++++++++
tools/spec-loop/specs/skill-reconciler.md | 136 +++++++++++++++++++
5 files changed, 346 insertions(+), 137 deletions(-)
diff --git a/tools/spec-loop/IMPLEMENTATION_PLAN.md
b/tools/spec-loop/IMPLEMENTATION_PLAN.md
index 8c955e8..6778dd0 100644
--- a/tools/spec-loop/IMPLEMENTATION_PLAN.md
+++ b/tools/spec-loop/IMPLEMENTATION_PLAN.md
@@ -59,6 +59,29 @@ one PR** (the branch-per-feature constraint).
setup-isolated-setup-install, setup-isolated-setup-update,
setup-isolated-setup-verify, setup-override-upstream,
setup-shared-config-sync).
+- **Release-management — first four skills shipped** —
+ `release-vote-draft`, `release-announce-draft`, `release-vote-tally`,
+ and `release-verify-rc` landed with eval suites (formerly planned work
+ items 1–2 plus two follow-ups). Six `release-*` skills remain; see
+
[`specs/release-management-lifecycle.md`](specs/release-management-lifecycle.md).
+- **Triage — general-issue family filled out** — `issue-stale-sweep`,
+ `issue-deduplicate`, and `issue-backlog-stats` shipped with eval suites
+ (formerly planned work item 3 plus its deferred siblings).
+ Spec: [`specs/triage-mode.md`](specs/triage-mode.md).
+- **Mentoring — first-contribution welcome shipped** — `mentoring-welcome`
+ landed with an eval suite (formerly planned work item 4).
+ Spec: [`specs/mentoring-mode.md`](specs/mentoring-mode.md).
+- **Project-agnosticism — ASF-coupling advisory lint shipped** — the SOFT
+ ASF-coupling category landed in `tools/skill-and-tool-validator`
+ (formerly planned work item 5), and `drafting-mode.md` Known Gaps is
+ synced to the shipped drafting skills (formerly planned work item 6).
+- **Repo-health — three-skill family shipped** — `ci-runner-audit`,
+ `workflow-security-audit`, and `dependency-audit` landed (read-only,
+ `experimental`). Spec:
[`specs/repo-health-family.md`](specs/repo-health-family.md).
+- **New proposed specs awaiting their first build item** —
+ [`specs/reviewer-routing.md`](specs/reviewer-routing.md) (Triage) and
+ [`specs/skill-reconciler.md`](specs/skill-reconciler.md) (infra) are
+ documented spec-first; their build items are below.
---
@@ -67,25 +90,13 @@ one PR** (the branch-per-feature constraint).
The following items are already built on local branches or open as PRs.
Do not duplicate them.
-| Branch slug | PR | Description |
-|---|---|---|
-| `injection-guard` | merged (#473) | Prompt-injection hardening on
forwarder-relay ingest |
-| `check-headers` | #474 | License-header enforcement check in spec-validator |
-| `spec-validator-known-gaps` | #490 | Enforce Known-gaps section in every
functional spec |
-| `spec-validate-hook` | #489 | pre-commit hook for spec-validate |
-| `skill-quality-fix` | #488 | Stabilise setup-verify eval + extend check-1
coverage |
-| `check-eval-coverage` | #481 | SOFT eval-coverage check (check #8) |
-| `eval-quick-merge` | #480 | pr-management-quick-merge skill + evals |
-| `spec-validator-path-check` | local | Validate paths referenced in
Validation blocks |
-| `spec-validator-spdx` | local | Enforce SPDX header on spec files |
-| `tracker-dashboard-tests` | local | pyproject + pytest suite for
security-tracker render.py |
-| `loop-imp` | #467 | Incremental update runs from .last-sync marker |
-| `loop-cli-ux` | #472 | Explicit loop.sh argument handling |
-| `node-bump-markdownlint` | local | Node 22.13→22.20 bump for markdownlint |
-| `token-reduction` | #479 | Slim AGENTS.md into a glossary |
-| `docs-modes-sync` | #483 | Sync modes.md skill inventory |
-| `docs-mentoring-sync` | #482 | Sync mentoring spec to experimental |
-| `eval-setup-status` | #484 | Fix setup-status eval prompts |
+None in flight. The previous in-flight batch (spec-validator SPDX /
+path-existence / Known-gaps checks, the `spec-validate` pre-commit hook,
+the SOFT eval-coverage check, `pr-management-quick-merge`, the
+security-tracker dashboard pytest suite, the loop incremental-sync and
+CLI-UX changes, the markdownlint Node bump, the AGENTS.md slim, and the
+modes / mentoring / setup-status doc syncs) has all merged and is
+reflected in the code and in **What's been built** above.
---
@@ -94,109 +105,48 @@ Do not duplicate them.
Priority order. Each maps to one branch and one PR. Branch names are
slugs, not numbers (numbering implies an order the specs don't carry).
-1. **First release-management skill: release-vote-draft.**
- `specs/release-management-lifecycle.md` is the only `proposed` spec
- with zero implemented skills. The adopter contract templates
- (`projects/_template/release-management-config.md`,
- `release-build.md`, `pmc-roster.md`, `release-trains.md`,
- `site-repo.md`) already exist. `release-vote-draft` is the most
- standalone and highest-frequency PMC task: it takes RC metadata
- (project name, version, RC number, artifact URLs) and produces a
- VOTE email draft following ASF conventions. Include an eval suite
- in `tools/skill-evals/evals/release-vote-draft/`.
- Validation:
- ```bash
- uv run --project tools/skill-and-tool-validator --group dev
skill-and-tool-validate
- uv run --project tools/skill-evals skill-eval
tools/skill-evals/evals/release-vote-draft/
- ```
- Spec:
[`specs/release-management-lifecycle.md`](specs/release-management-lifecycle.md).
- Branch `release-vote-draft`.
-
-2. **Second release-management skill: release-announce-draft.**
- Companion to `release-vote-draft`. Takes a successful vote tally
- (binding +1 count, RC metadata) and produces the ANNOUNCE email
- draft for the ASF announce@ and dev@ lists, following ASF posting
- conventions (subject: `[ANNOUNCE] Apache <Project> <Version>
- released`). Also standalone: it does not depend on
- `release-vote-draft` being run in the same session. Include an
- eval suite.
- Validation:
- ```bash
- uv run --project tools/skill-and-tool-validator --group dev
skill-and-tool-validate
- uv run --project tools/skill-evals skill-eval
tools/skill-evals/evals/release-announce-draft/
- ```
- Spec:
[`specs/release-management-lifecycle.md`](specs/release-management-lifecycle.md).
- Branch `release-announce-draft`.
-
-3. **Stale-issue sweep for general triage.**
- `specs/triage-mode.md` Known Gaps explicitly names stale-handling
- as missing from the general-issue side (the security side covers
- this via `security-issue-sync`). Add a new skill
- `issue-stale-sweep` that surfaces issues with no activity past a
- configurable threshold and proposes closure or an update request
- (waits for maintainer confirmation before posting). Include an eval
- suite.
+1. **First reviewer-routing skill: reviewer-routing.**
+ `specs/reviewer-routing.md` is `proposed` with zero implemented
+ skills, and review-cycle latency is one of the two priorities MISSION
+ names. Add a Triage-family skill `reviewer-routing` that takes an open
+ issue or PR and proposes a primary reviewer (and optional backup) from
+ the project's configured roster, scored on roster eligibility for the
+ touched area, git-history familiarity with the changed paths, and the
+ reviewer's current open-review load. Read-only / propose-then-confirm:
+ it never assigns or requests review. An unresolved roster yields an
+ explicit `NO ELIGIBLE REVIEWER` signal, never a fabricated handle.
+ Include an eval suite with an adversarial case asserting an injected
+ "assign to X" line in a PR body is ignored.
Validation:
```bash
uv run --project tools/skill-and-tool-validator --group dev
skill-and-tool-validate
- uv run --project tools/skill-evals skill-eval
tools/skill-evals/evals/issue-stale-sweep/
+ uv run --project tools/skill-evals skill-eval
tools/skill-evals/evals/reviewer-routing/
```
- Spec: [`specs/triage-mode.md`](specs/triage-mode.md).
- Branch `issue-stale-sweep`.
-
-4. **First-contribution welcome/orientation skill.**
- `specs/mentoring-mode.md` Known Gaps names the "first-contribution
- welcome/orientation skill" as missing. Add `mentoring-welcome`,
- which greets first-time contributors on a newly opened issue or PR
- with orientation context: contributing guide link, community norms,
- expected next steps, and a pointer to the good-first-issue pool.
- Waits for maintainer confirmation before posting. Include an eval
- suite.
- Validation:
- ```bash
- uv run --project tools/skill-and-tool-validator --group dev
skill-and-tool-validate
- uv run --project tools/skill-evals skill-eval
tools/skill-evals/evals/mentoring-welcome/
- ```
- Spec: [`specs/mentoring-mode.md`](specs/mentoring-mode.md).
- Branch `mentoring-welcome`.
-
-5. **ASF-coupling advisory lint (fold into `skill-and-tool-validator`).**
- `specs/project-agnosticism.md` Known Gaps names the absence of an
- automated ASF-coupling check as its first gap. Add a new SOFT advisory
- category to `tools/skill-and-tool-validator` that reuses the existing
- walk, file allowlist, and inline `e.g.`/`example:` markers (the same
- machinery as the placeholder check). It flags a curated, tiered set of
- ASF-coupled tokens in skill bodies (high-confidence:
- `svn (mv|commit|co)`, `[email protected]`, `dist/(dev|release)/`,
- Vulnogram URLs; low-confidence: bare `PMC` / `ICLA` / `incubator`) and
- tags each hit with a remedy class (placeholder / adapter /
- capability-flag). SOFT only: surfaces on stderr, never fails the build.
- Extend the validator tests with a coupled fixture and an allowlisted
- fixture.
+ Spec: [`specs/reviewer-routing.md`](specs/reviewer-routing.md).
+ Branch `reviewer-routing`.
+
+2. **Cross-project skill reconciler: skill-reconciler.**
+ `specs/skill-reconciler.md` is `proposed` with no implementation. Add
+ a meta/infra-family skill `skill-reconciler` that compares two
+ near-duplicate skills (two `source`-tagged copies, e.g. an ASF and a
+ non-ASF variant) and emits a structured diff plus a reconciliation
+ proposal, labelling every difference `ALLOWED`, `DRIFT`, or
+ `SAFETY-BASELINE`. Read-only: it proposes, it never rewrites either
+ skill (convergence is a separate confirmed `write-skill` /
+ `optimize-skill` edit). A safety-baseline divergence is always a
+ must-fix and never folded into allowed-divergence noise. First cut may
+ take two explicit paths rather than auto-pairing by `source` tag.
+ Include an eval suite with a case where the two copies diverge only on
+ the safety baseline and the reconciler must flag it.
Validation:
```bash
- uv run --project tools/skill-and-tool-validator --group dev pytest
uv run --project tools/skill-and-tool-validator --group dev
skill-and-tool-validate
+ uv run --project tools/skill-evals skill-eval
tools/skill-evals/evals/skill-reconciler/
```
- Spec: [`specs/project-agnosticism.md`](specs/project-agnosticism.md).
- Branch `asf-coupling-lint`.
-
-6. **Sync drafting-mode spec Known Gaps to reflect shipped skills.**
- `specs/drafting-mode.md` Known Gaps still says "Generic
- (non-security, non-issue) Drafting from audit-tool findings is
- `proposed`", but `audit-finding-fix` shipped with a full eval suite.
- Update the Known Gaps section to reflect the current state and
- remove the stale `proposed` claim so new plan passes do not
- re-raise this as a gap.
- Validation:
- ```bash
- uv run --project tools/spec-validator --group dev spec-validate
tools/spec-loop/specs/
- uv run --project tools/spec-validator --group dev pytest
- ```
- Spec: [`specs/drafting-mode.md`](specs/drafting-mode.md).
- Branch `drafting-spec-sync`.
+ Spec: [`specs/skill-reconciler.md`](specs/skill-reconciler.md).
+ Branch `skill-reconciler`.
-7. **Non-ASF adopter profile fixture + smoke eval.**
+3. **Non-ASF adopter profile fixture + smoke eval.**
`specs/project-agnosticism.md` acceptance #3 requires that a non-ASF
profile can be declared without editing any skill body, but there is
no fixture to prove it. Add a worked non-ASF profile under
@@ -226,35 +176,31 @@ slugs, not numbers (numbering implies an order the specs
don't carry).
it would skip the proof MISSION requires.
- When a build iteration creates a new skill, its eval suite is part of
that same work item — not a separate one.
-- **Release-management family:** only the two most standalone skills
- (`release-vote-draft`, `release-announce-draft`) are planned here.
- The remaining eight (`release-prepare`, `release-keys-sync`,
- `release-rc-cut`, `release-verify-rc`, `release-vote-tally`,
+- **Release-management family:** the first four skills (`release-vote-draft`,
+ `release-announce-draft`, `release-vote-tally`, `release-verify-rc`)
+ have shipped and are recorded in **What's been built**. The remaining
+ six (`release-prepare`, `release-keys-sync`, `release-rc-cut`,
`release-promote`, `release-archive-sweep`, `release-audit-report`)
- should be planned in subsequent passes once the first two establish
- the skill-authoring patterns for this family.
+ should be planned in subsequent passes now that the first four have
+ established the skill-authoring patterns for this family.
- **Triage contributor-growth gaps** (PMC-member nomination,
emeritus-committer handling, contributor offboarding) noted in
`triage-mode.md` Known Gaps are intentionally deferred: they are
vague enough that a spec-RFC conversation is more appropriate than
a direct build item.
-- **Project-agnosticism:** two of the three gaps in
- `project-agnosticism.md` are buildable and planned now: the ASF-coupling
- advisory lint (work item 5) and the non-ASF adopter profile fixture
- (work item 7). The remaining gap, the capability-flag vocabulary for
- contributor intake (ICLA vs DCO), security intake, and CVE allocation,
- is deferred only until someone enumerates the option sets and defaults,
- following the backend-flag precedent already set by
+- **Project-agnosticism:** the ASF-coupling advisory lint has shipped
+ (recorded in **What's been built**); the non-ASF adopter profile
+ fixture is work item 3. The remaining gap, the capability-flag
+ vocabulary for contributor intake (ICLA vs DCO), security intake, and
+ CVE allocation, is deferred only until someone enumerates the option
+ sets and defaults, following the backend-flag precedent already set by
`release-management-lifecycle.md` (distribution / approval / announcement
backends). That is a spec-authoring task, not yet a build item.
- **General-issue dedupe and backlog dashboard** (`triage-mode.md` Known
- Gaps) are deferred behind `issue-stale-sweep` (work item 3): dedupe
- overlaps the existing `security-issue-deduplicate` matching approach and
- a backlog dashboard overlaps `pr-management-stats`, so both should reuse
- those patterns once stale-sweep establishes the general-issue skill
- shape. Not dropped, sequenced after item 3.
-- **Repo-health family** (`triage-mode.md` Known Gaps: the standalone
- `ci-runner-audit` plus candidate siblings, GitHub Actions security
- audit, dependency-update triage, license/NOTICE compliance, flaky-test
- detection) is deferred pending a family spec; it is a multi-skill area
- that wants its own spec before any build item.
+ Gaps) have shipped (`issue-deduplicate`, `issue-backlog-stats`) alongside
+ `issue-stale-sweep`; see **What's been built**. No longer planned items.
+- **Repo-health family** has shipped its first three members
+ (`ci-runner-audit`, `workflow-security-audit`, `dependency-audit`) under
+ its own [`specs/repo-health-family.md`](specs/repo-health-family.md);
+ remaining candidates (license / NOTICE compliance, flaky-test detection)
+ are deferred to a subsequent pass.
diff --git a/tools/spec-loop/specs/README.md b/tools/spec-loop/specs/README.md
index b6b852e..18884fc 100644
--- a/tools/spec-loop/specs/README.md
+++ b/tools/spec-loop/specs/README.md
@@ -39,7 +39,9 @@ Start with [`overview.md`](overview.md), then:
[`adapters.md`](adapters.md),
[`project-agnosticism.md`](project-agnosticism.md),
[`meta-and-quality-tooling.md`](meta-and-quality-tooling.md),
- [`security-reporting.md`](security-reporting.md).
+ [`security-reporting.md`](security-reporting.md),
+ [`reviewer-routing.md`](reviewer-routing.md),
+ [`skill-reconciler.md`](skill-reconciler.md).
(Auto-merge, the fifth MISSION mode, is deliberately off and has no
spec — see the note in [`overview.md`](overview.md).)
diff --git a/tools/spec-loop/specs/overview.md
b/tools/spec-loop/specs/overview.md
index 25e5fa4..3ba8ab0 100644
--- a/tools/spec-loop/specs/overview.md
+++ b/tools/spec-loop/specs/overview.md
@@ -54,6 +54,8 @@ Each mode is an independently toggleable set of skills.
Maturity mirrors
| Adapters (Gmail / PonyMail / Jira / GitHub / mail-source / forwarder-relay /
mail-archive / github-body-field / github-rollup) | [adapters.md](adapters.md) |
| Project-agnosticism (de-ASF coupling) |
[project-agnosticism.md](project-agnosticism.md) |
| Meta & quality tooling |
[meta-and-quality-tooling.md](meta-and-quality-tooling.md) |
+| Reviewer routing (proposed, Triage) |
[reviewer-routing.md](reviewer-routing.md) |
+| Cross-project skill reconciler (proposed, infra) |
[skill-reconciler.md](skill-reconciler.md) |
## The non-negotiables every area inherits
diff --git a/tools/spec-loop/specs/reviewer-routing.md
b/tools/spec-loop/specs/reviewer-routing.md
new file mode 100644
index 0000000..9f3989a
--- /dev/null
+++ b/tools/spec-loop/specs/reviewer-routing.md
@@ -0,0 +1,123 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Reviewer routing
+status: proposed
+kind: feature
+mode: Triage
+source: >
+ MISSION.md § Rationale ("review-cycle latency" is one of the two named
+ priorities) and § Technical scope (Triage: "proposes initial routing",
+ "proposes routing"). The substrate config already declares "who
+ reviews" (overview.md § Substrate; projects/_template adopter config),
+ but no skill turns that roster plus repository signal into an assignee
+ suggestion. triage-mode.md § What it does ("propose routing to the
+ right human") names the behaviour; no skill implements it yet.
+acceptance:
+ - The skill is read-only on tracker state and proposes-then-confirms;
+ it never assigns, requests review, or labels without confirmation.
+ - The suggested reviewer is drawn from the project's configured roster
+ only; the skill never invents a handle or routes to a non-member.
+ - Every suggestion carries its reasoning (touched paths, prior-art
+ PRs, current open-review load) so the maintainer can audit the call.
+---
+
+# Reviewer routing
+
+## What it does
+
+Proposes which maintainer an inbound issue or PR should go to. The two
+complaints MISSION names loudest are onboarding latency and review-cycle
+latency; routing attacks the second directly by removing the "who should
+look at this?" pause that stalls a fresh PR before any review begins.
+
+Given an open issue or PR, the skill scores the configured reviewer
+roster and proposes a primary reviewer (and optionally a backup),
+grounding each suggestion in three signals: roster eligibility for the
+touched area, git-history familiarity with the changed paths, and the
+reviewer's current open-review load so routing spreads work instead of
+piling it on the most-recently-active person. The output is a proposal a
+maintainer confirms; nothing is assigned on autopilot. This is the
+Triage-mode counterpart to `contributor-nomination` on the read-only
+side: a grounded brief a human acts on, not a state change.
+
+## Where it lives
+
+- Skill (proposed, not implemented): `reviewer-routing` under
+ `skills/`, in the Triage family alongside `pr-management-triage` and
+ `issue-triage`.
+- Roster source: the project's configured reviewer roster
+ (`projects/<project>/` adopter config; `pmc-roster.md` for ASF
+ projects, an arbitrary maintainer list for non-ASF adopters). The
+ skill reads the roster through configuration, never a hard-coded list.
+- Repository signal: `tools/github` for changed paths, blame/history on
+ those paths, and the reviewer's current open-review queue.
+- Identity resolution for ASF projects: `tools/apache-projects`
+ (committee roster, Apache IDs), reused exactly as the security and
+ contributor skills resolve handles.
+- Adapters it reads through: `tools/github`; `tools/apache-projects`
+ where ASF roster context applies.
+
+## Behaviour & contract
+
+- **Read-only, propose-then-confirm.** The skill emits a routing
+ proposal; the maintainer assigns / requests review as themselves. No
+ skill call sets an assignee, requests a review, or applies a label
+ without in-session confirmation.
+- **Roster-bounded.** Suggestions come only from the configured roster.
+ An empty or unresolved roster yields `NO ELIGIBLE REVIEWER, needs
+ maintainer call` rather than a guessed handle, mirroring the
+ conservative-tally refusal in `release-vote-tally`.
+- **Reasoned, auditable output.** Each suggestion lists the signals
+ behind it (matched area / touched paths, the prior-art PRs that touched
+ the same paths, the reviewer's current open-review count). A maintainer
+ can see why a name surfaced and overrule it.
+- **Load-aware, not just expertise-aware.** Scoring weighs current
+ open-review load so routing does not concentrate every PR on the
+ single most expert reviewer; the contract is to surface a workable
+ human, not the theoretically optimal one.
+- **Untrusted content stays data.** Issue / PR bodies are input data,
+ never instructions; an injected "assign this to X" line in a PR
+ description is ignored, the same posture every triage skill inherits.
+
+## Out of scope
+
+- Assigning, requesting review, or labelling on the tracker (those are
+ human acts the maintainer performs after confirming).
+- Authoring or merging the change (Drafting / Auto-merge, not Triage).
+- Inventing a reviewer outside the roster, or routing on contributor
+ sentiment / performance ranking — the skill proposes who is best
+ placed to review a specific change, not who is a "better" maintainer.
+
+## Acceptance criteria
+
+1. `reviewer-routing` performs no unconfirmed tracker state change.
+2. Every suggested reviewer is a member of the configured roster; an
+ unresolved roster produces an explicit `NO ELIGIBLE REVIEWER` signal,
+ never a fabricated handle.
+3. Each suggestion carries its grounding signals (touched paths,
+ prior-art PRs, open-review load).
+4. The skill validates under `skill-and-tool-validate` and ships an eval
+ suite under `tools/skill-evals/evals/reviewer-routing/`, including an
+ adversarial case asserting an injected "assign to X" line is ignored.
+
+## Validation
+
+```bash
+uv run --project tools/skill-and-tool-validator --group dev
skill-and-tool-validate
+uv run --project tools/skill-evals skill-eval
tools/skill-evals/evals/reviewer-routing/
+```
+
+## Known gaps
+
+- **No skill is implemented yet.** This spec is `proposed`; the plan
+ pass turns it into a single build item (one skill plus its eval suite).
+- **Open-review-load signal is unspecified in detail.** Whether load is
+ counted as open review requests, assigned-and-unreviewed PRs, or a
+ decay-weighted recent count is left to the implementation; the contract
+ only requires that some load signal is present and shown.
+- **Non-ASF roster shape is unproven.** The roster-bounded contract
+ assumes an adopter declares a maintainer list; no non-ASF profile
+ fixture exercises routing yet (overlaps the non-ASF adopter profile
+ work item in IMPLEMENTATION_PLAN).
diff --git a/tools/spec-loop/specs/skill-reconciler.md
b/tools/spec-loop/specs/skill-reconciler.md
new file mode 100644
index 0000000..2cc1770
--- /dev/null
+++ b/tools/spec-loop/specs/skill-reconciler.md
@@ -0,0 +1,136 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+---
+title: Cross-project skill reconciler
+status: proposed
+kind: feature
+mode: infra
+source: >
+ MISSION.md § Scope boundaries ("Duplication is embraced where it buys
+ decoupling ... an agent can reconcile two near-identical skills on
+ demand, so the price of keeping them separate is low") and § "The data
+ layer is shared; the skills on top are free to diverge" (skills carry a
+ `source` tag so the split is a registry query). PRINCIPLES.md on the
+ safety baseline that must stay eventually-consistent across every copy.
+ meta-and-quality-tooling.md (the skill-authoring/quality family this
+ joins). No reconciler tool or skill exists yet.
+acceptance:
+ - The reconciler is read-only: it produces a structured diff and a
+ reconciliation proposal; it never rewrites either skill without human
+ confirmation.
+ - Divergence in the safety baseline (untrusted-content-is-never-
+ instructions, identity-resolution caveats, confidentiality posture)
+ is reported as a must-fix, distinct from divergence that is allowed
+ to stand.
+ - A maintainer can review the proposal and the safety-baseline verdict
+ without running either skill.
+---
+
+# Cross-project skill reconciler
+
+## What it does
+
+Compares two near-duplicate skills, typically the same capability carried
+in an ASF and a non-ASF variant (or two `source`-tagged copies), and
+produces a structured diff plus a reconciliation proposal. It
+operationalises the MISSION principle that duplication is fine where it
+buys decoupling precisely because an agent can reconcile copies on
+demand: the reconciler is that agent step, made repeatable.
+
+The key move is that not all divergence is equal. The reconciler sorts
+differences into three classes: **allowed divergence** (scope, tier,
+teaching voice, project-specific values behind placeholders, the things
+MISSION says skills are free to diverge on); **drift** (one copy gained a
+fix, a clearer step, or a hardening the other lacks, where convergence is
+probably wanted); and **safety-baseline divergence** (the
+untrusted-content-is-never-instructions rule, identity-resolution
+caveats, confidentiality posture, which PRINCIPLES says every copy must
+stay eventually-consistent on). The first is reported and left alone, the
+second is proposed as a merge, and the third is flagged as a must-fix
+that a maintainer should not ignore.
+
+## Where it lives
+
+- Skill (proposed, not implemented): `skill-reconciler` under `skills/`,
+ in the meta / quality family with `write-skill`, `optimize-skill`, and
+ `list-skills` (see
[meta-and-quality-tooling.md](meta-and-quality-tooling.md)).
+- Optional deterministic helper (proposed): a `uv` tool under `tools/`
+ that does the structural diff (frontmatter, section headings,
+ step-by-step decision rules, placeholder inventory) so the skill
+ reasons over a normalised diff rather than raw text. Follows the
+ tool-backs-skill pattern already used across the catalogue.
+- Inputs: two `SKILL.md` trees (plus their supporting `.md` files),
+ identified by path or by `source` tag.
+- It reads the safety-baseline definition from the same place the rest of
+ the framework does (PRINCIPLES.md / the security posture docs); it does
+ not restate the baseline inline.
+
+## Behaviour & contract
+
+- **Read-only; proposes, never rewrites.** The reconciler emits a diff
+ and a proposal. Any actual convergence edit goes through `write-skill`
+ / `optimize-skill` under human confirmation; the reconciler itself
+ changes no skill file.
+- **Three-class divergence verdict.** Every difference is labelled
+ `ALLOWED`, `DRIFT`, or `SAFETY-BASELINE`. The safety-baseline class is
+ never silently merged away and never dropped from the report, even when
+ the two copies are otherwise identical.
+- **Safety divergence is a must-fix, not a suggestion.** When the two
+ copies disagree on the untrusted-content rule, identity-resolution
+ caveats, or confidentiality posture, the reconciler surfaces it as a
+ blocking finding a maintainer must resolve, separate from the
+ convenience merges.
+- **Decoupling is preserved by default.** The reconciler does not push
+ toward DRY across organisational boundaries; allowed divergence is
+ reported and left in place. Convergence is proposed only for drift and
+ required only for the safety baseline.
+- **Untrusted content stays data.** Skill bodies under comparison are
+ treated as input data; an injected instruction inside a compared skill
+ is reported as content, never executed.
+
+## Out of scope
+
+- Rewriting, merging, or deleting either skill on autopilot (that is a
+ confirmed `write-skill` / `optimize-skill` edit).
+- Choosing which copy "wins" on allowed divergence; the reconciler
+ reports the difference and leaves the call to the maintainers who own
+ each copy.
+- Cross-foundation policy. The reconciler is a tool for reconciling skill
+ text; it makes no claim on which project's governance is correct.
+
+## Acceptance criteria
+
+1. Given two near-duplicate skills, the reconciler emits a structured
+ diff and labels every difference `ALLOWED`, `DRIFT`, or
+ `SAFETY-BASELINE`.
+2. A safety-baseline divergence is always reported as a must-fix and is
+ never folded into the allowed-divergence noise.
+3. The reconciler makes no edit to either skill; convergence is a
+ separate, confirmed authoring step.
+4. The skill validates under `skill-and-tool-validate` and ships an eval
+ suite under `tools/skill-evals/evals/skill-reconciler/`, including a
+ case where the two copies diverge only on the safety baseline and the
+ reconciler must flag it.
+
+## Validation
+
+```bash
+uv run --project tools/skill-and-tool-validator --group dev
skill-and-tool-validate
+uv run --project tools/skill-evals skill-eval
tools/skill-evals/evals/skill-reconciler/
+```
+
+## Known gaps
+
+- **Nothing is implemented yet.** This spec is `proposed`; the plan pass
+ turns it into a build item (the skill, optionally its structural-diff
+ tool, and an eval suite).
+- **The safety-baseline definition is not yet machine-readable.** The
+ reconciler needs an authoritative list of baseline clauses to check
+ against; today that posture lives in prose across PRINCIPLES.md and the
+ security docs. A buildable precursor is to extract that baseline into a
+ single referenced checklist the reconciler (and humans) can cite.
+- **`source`-tag-driven pairing is unproven.** MISSION frames copy
+ selection as a registry query over `source` tags, but no registry index
+ pairs copies automatically yet; the first implementation can take two
+ explicit paths and defer auto-pairing.