justinmclean opened a new pull request, #488:
URL: https://github.com/apache/airflow-steward/pull/488

   ## Summary
   
   Follow-up to the heading fix, which let the eval run and exposed
   latent fixture failures.
   
   - step-1-classify: add grading-schema.json marking 'evidence' as a
     prose field so equivalent wording is graded on conclusion, not
     verbatim string match (evidence is not in the runner default set).
   - case-3 / case-6: correct check status from X to partial for a
     missing PostToolUse/sandbox-error-hint.sh, matching the documented
     'missing error-hint reports partial, not missing' rule in SKILL.md.
   - step-2 case-4: broaden expected follow_up reason to span the same
     scope as a correct answer (.git/HEAD read failure consequence plus
     helper remediation), consistent with case-3/case-5 style.
   - Extend check 1 to enumerate the sandbox.filesystem allowlist
     (allowRead/allowWrite) in both SKILL.md and the cited canonical
     doc, so the model reports it consistently.
   
   Validation: eval suite fully green (11/11)
   
   ## Type of change
   
   - [X] Skill change (`.claude/skills/<name>/`) — eval fixtures updated below
   - [ ] Tool / bridge contract (`tools/<system>/*.md`)
   - [ ] Python package (`tools/*/` with `pyproject.toml`)
   - [ ] Groovy reference impl
   - [ ] Cross-cutting (RFC, AGENTS.md, sandbox, privacy-LLM)
   - [ ] Documentation (`docs/`, `README.md`, `CONTRIBUTING.md`)
   - [ ] Project template (`projects/_template/`)
   - [ ] CI / dev loop (`prek`, workflows, validators)
   - [ ] Other:
   
   ## Test plan
   
   - [X] `prek run --all-files` passes
   - [ ] For Python packages touched: `uv run pytest` / `ruff check` / `mypy` 
passes
   - [ ] For Groovy bridges touched: command-line invocation tested end-to-end
   - [X] For skill changes: eval suite passes for the affected skill
         (`PYTHONPATH=tools/skill-evals/src python3 -m skill_evals.runner 
tools/skill-evals/evals/<skill>/`)
   - [ ] For skill *behaviour* changes: a new or updated eval fixture is 
included in this PR
         (a regression test for the bug fixed / the behaviour added — see 
CONTRIBUTING.md)
   - [ ] Other:
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to