choo121600 commented on issue #118: URL: https://github.com/apache/airflow-steward/issues/118#issuecomment-4428619543
# Follow-up audit: principle adherence after the trim wave I ran the audit principles from this issue against `main` (HEAD `ee0fa4c`) after the trim PRs (#119–#126, plus the earlier #127/#128) landed. Two takeaways: 1. **Quantitative gain** — total frontmatter cost across the 20 framework skills dropped from ~5,955 → ~5,371 tokens per turn (-584, ≈10%). The largest combined `description + when_to_use` is now 1,233 chars (was 1,514), well under the 1,536 budget. 2. **Qualitative — partial principle adherence** — the "keep router-essential, push detail to body" rule is well-observed in some PRs but residual violations survived in several. Notably the two strictest categories (*Distinct-from* and *Long-example*) persisted even in skills that were nominally "trimmed." ## Methodology - Token counts: `tiktoken cl100k_base` as a proxy for Claude's tokenizer (±5–10% on English text). - Char counts for the budget check: raw `description + when_to_use` block-scalar bodies. - Principle audit: each line classified per the issue's criteria — **Keep** (trigger phrases, routing cues, high-signal task description) vs **Move-to-body** (rationale, implementation notes, long examples, distinct-from explanations, criteria-source paths, sub-step descriptions). - Trigger-phrase preservation check: eyeball-only for now — proposing automation below. ## Current state (HEAD `ee0fa4c`) ### Tokens & budget — all under budget <details> <summary>Full table (20 skills, sorted by frontmatter tokens desc)</summary> | Skill | FM tok | desc+when chars | |---|---:|---:| | security-issue-triage | 315 | 1103 | | security-issue-import-from-pr | 305 | 1036 | | security-issue-import | 304 | 1073 | | setup-steward | 302 | 1040 | | pr-management-code-review | 299 | 944 | | pr-management-triage | 287 | 900 | | write-skill | 280 | 1036 | | security-issue-invalidate | 279 | 1025 | | setup-isolated-setup-install | 274 | 962 | | setup-isolated-setup-update | 273 | 996 | | security-issue-import-from-md | 271 | 975 | | setup-isolated-setup-verify | 268 | 1024 | | pr-management-mentor | 267 | 972 | | security-issue-fix | 260 | 881 | | security-cve-allocate | 257 | 813 | | setup-override-upstream | 246 | 892 | | setup-shared-config-sync | 241 | 831 | | security-issue-deduplicate | 240 | 849 | | security-issue-sync | 215 | 739 | | pr-management-stats | 188 | 604 | | **TOTAL** | **5,371** | — | </details> Headroom is comfortable — the principle, not the budget, is the binding constraint from here on. ### Principle adherence audit **CLEAN — 8 skills**: `pr-management-mentor` (#119), `pr-management-stats` (baseline-clean), `pr-management-triage` (#125), `security-cve-allocate` (#124, one minor `(process step 6)` paren), `security-issue-fix` (#122, one minor `process step 9 of README.md` ref), `security-issue-sync`, `security-issue-triage` (#120 — exemplary "Skip when … invoke `/X` directly" routing block), `setup-isolated-setup-install` (#121). **MODERATE — 12 skills with at least one residual violation**: | Skill | Trimmed? | Residual violation | Example | |---|---|---|---| | `pr-management-code-review` | no | Distinct-from + parenthetical criteria-source | "Distinct from `pr-management-triage` (which decides *whether* to engage); this skill runs **after** …" | | `security-issue-deduplicate` | no | Rationale paren + sub-step + criteria source | "(typically discovered independently by two reporters …)" / "Step 2a surfaces a STRONG match …" | | `security-issue-import` | no | Rationale narrative | "This is the first step of the handling process: the entry point that converts an inbound email thread into a tracker the rest of the skills … operate on." | | `security-issue-import-from-md` | #126 | Distinct-from + long-example | "Unlike `security-issue-import` (Gmail) and `security-issue-import-from-pr` (public PR), there is no inbound reporter …" / "Typical sources: AI security review output, third-party SAST report …" | | `security-issue-import-from-pr` | no | Rationale paren + sub-step + chain handoff | "(the team-deliberate import implies the security assessment has already happened)" / "ready for `security-cve-allocate` to take over." | | `security-issue-invalidate` | no | Sub-step inventory + criteria source + rationale parens | "apply the `invalid` label, remove the scope label, post a short closing comment, archive …" / "(a separate Vulnogram REJECT flow is required first)" | | `setup-isolated-setup-update` | no | Sub-step enum + rationale | "Reports framework-checkout updates (…), pinned-tool upgrade candidates (…), drift between …" / "Recommended cadence per the doc: once per Claude Code upgrade or once a month …" | | `setup-isolated-setup-verify` | no | Criteria-source path + checklist enum + rationale parens | "Walk the verification checklist documented in `docs/setup/secure-agent-setup.md`" / "(the "did a denial silently turn into an allow?" canary)" | | `setup-override-upstream` | #128 | Full sub-step workflow + post-merge note | "Lists the adopter's overrides, helps pick one, reads it alongside the framework skill it modifies, decides whether the change is generalisable, designs the framework-level abstraction, implements it …" | | `setup-shared-config-sync` | #127 | git-rebase rationale (the "never X" scope statements are borderline) | "Runs `git pull --rebase` first if the local checkout is behind, so a push never overwrites concurrent work from another machine." | | `setup-steward` | #123 | Sub-actions enum duplicates `argument-hint` | "Sub-actions: `/setup-steward` — …, `/setup-steward upgrade` — …" vs `argument-hint: "[adopt\|upgrade\|verify\|override skill-name\|unadopt]"` | | `write-skill` | no | Sub-step enum + criteria-source paths | "Walks the user through the framework-specific skill shape — YAML frontmatter (…), bundled resources (…), placeholder convention (…), … and the Privacy-LLM gate-check boilerplate." | ## Five reusable violation patterns The residuals cluster into 5 patterns. Codifying these in `write-skill` would make new skills land principle-compliant by construction: 1. **Action-inventory in description** — "Does A, B, C, D, E, F, G, H." Reduce to verb + outcome; move the enum to body unless it *is* the high-signal contract. 2. **Distinct-from-sibling-skill** — "Unlike X / Distinct from Y / Counterpart to Z." The router needs the *trigger* and *skip-when for the wrong sibling*, not the comparison. `security-issue-triage` after #120 shows the right shape: `Skip when team consensus … has already landed — invoke /X (VALID), /Y (INFO-ONLY) directly` — a routing redirect, not a comparison. 3. **Chain-handoff narrative** — "Finishes by handing off to X" / "ready for Y to take over." Body content; the chain doesn't change which skill the router picks. 4. **Parenthetical rationale** — "(typically …)", "(a separate REJECT flow is required first)", "(the team-deliberate import implies …)". The router needs *whether*, not *why*. 5. **Criteria-source path** — "per process step 6", "`docs/setup/secure-agent-setup.md`", "Step 2a". The router doesn't open the doc. ## Proposed follow-up With this issue closed, treating this as a retrospective — happy to file a fresh issue if that's the cleaner home. - **Codify the 5 patterns in `write-skill`** — its own frontmatter currently violates patterns 1 and 5 (the YAML/resources/placeholder/preamble/injection/Privacy-LLM enum and the validator path), so it's a natural first PR: dogfood the rule and ship the doc together. - **Add a non-regression check for trigger phrases** — extract quoted strings from `when_to_use` before/after a trim PR; the set must be equal-or-superset of `main`. Could live as a `tools/skill-validator` SOFT warning (matching the existing progressive-disclosure precedent). - **Add SOFT warnings for the 5 patterns** in `tools/skill-validator` — surface them in CI as advisory, not blocking; reviewer keeps the final say on borderline cases (scope statements, action enums that are the contract). - **Trim the remaining 12** — one small PR per skill. Five of those are previously-trimmed skills with residual violations, so they're useful calibration data for the codified guidance. Audit scripts (`measure_frontmatter.py`, `budget_check.py`, frontmatter dumper) are local; happy to PR them under `tools/skill-validator/` if useful as ongoing CI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
