justinmclean opened a new pull request, #481:
URL: https://github.com/apache/airflow-steward/pull/481
## Summary
Every skill under skills/ must ship a matching behavioural eval suite under
tools/skill-evals/evals/<slug>/. The new validate_eval_coverage function flags
missing suites as SOFT advisory violations so that in-flight eval PRs do not
fail the gate while their branches are pending review.
Against the live repo the check correctly flags the two skills that
currently have in-flight eval branches (pr-management-quick-merge and
setup-status) and is silent on all others. 8 new test cases cover the happy
path, the missing-eval path, missing-both-dirs paths, the soft-category
membership, and the non-directory skip.
Addresses the Known Gap in specs/meta-and-quality-tooling.md: "Eval coverage
is incomplete — skills added before the per-skill-eval convention have no
suite." The check prevents future regressions.
## Type of change
<!-- Tick all that apply. -->
- [ ] Skill change (`.claude/skills/<name>/`) — eval fixtures updated below
- [ ] Tool / bridge contract (`tools/<system>/*.md`)
- [ ] Python package (`tools/*/` with `pyproject.toml`)
- [ ] Groovy reference impl
- [ ] Cross-cutting (RFC, AGENTS.md, sandbox, privacy-LLM)
- [ ] Documentation (`docs/`, `README.md`, `CONTRIBUTING.md`)
- [ ] Project template (`projects/_template/`)
- [ ] CI / dev loop (`prek`, workflows, validators)
- [X] Other:
## Test plan
<!--
How you verified the change. Be specific. The reviewer reads this to
decide what to spot-check vs trust. Empty "ran the tests" doesn't help.
-->
- [X] `prek run --all-files` passes
- [X] For Python packages touched: `uv run pytest` / `ruff check` / `mypy`
passes
- [ ] For Groovy bridges touched: command-line invocation tested end-to-end
- [ ] For skill changes: eval suite passes for the affected skill
(`PYTHONPATH=tools/skill-evals/src python3 -m skill_evals.runner
tools/skill-evals/evals/<skill>/`)
- [ ] For skill *behaviour* changes: a new or updated eval fixture is
included in this PR
(a regression test for the bug fixed / the behaviour added — see
CONTRIBUTING.md)
- [ ] Other:
## RFC-AI-0004 compliance
<!--
Tick the principles the change touches. Skip rows that don't apply.
RFC-AI-0004 is the framework's constitution — see
docs/rfcs/RFC-AI-0004.md.
-->
- [ ] **HITL** — any new mutation is gated on explicit user confirmation
- [ ] **Sandbox** — no new unrestricted host access; network reach declared
in the adapter
- [ ] **Vendor neutrality** — placeholders (`<PROJECT>`, `<tracker>`,
`<upstream>`, `<security-list>`) used in all skill / tool prose (the
`check-placeholders` prek hook is the mechanical gate)
- [ ] **Conversational + correctable** — agentic-override path documented if
behaviour is adopter-tunable
- [ ] **Write-access discipline** — no autonomous outbound messages; drafts
only, sent on confirmation
- [ ] **Privacy LLM** — private content does not reach a non-approved LLM;
redactor invoked where needed
## Linked issues
<!-- e.g. Closes #NNN, Refs #NNN. List every related issue. -->
## Notes for reviewers (optional)
<!--
Anything specific you want the reviewer to look at. Areas of uncertainty.
Trade-offs you considered and rejected. Decisions that the agent and you
disagreed on during the authoring loop.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]