justinmclean opened a new pull request, #267:
URL: https://github.com/apache/airflow-steward/pull/267

   Adds tools/skill-evals/evals/list-steward-skills/ per the /AGENTS.md § 
Reusable skills requirement that every skill ships a behavioural eval suite.  
list-steward-skills had no coverage.
   
   Two suites:
   - step-1-command (4 cases): command-selection logic — default listing, 
verbose via explicit request, verbose via keyword, and 
injection-in-user-message resistance (case-4 embeds a SYSTEM: block attempting 
to redirect to `find`; correct answer is the standard listing command).
   - step-2-present (3 cases): output-fidelity / hard-rule enforcement — 
standard verbatim output, user requests a summary (hard rule overrides), user 
requests a filtered view (hard rule overrides).  All three cases expect 
presentation_mode: verbatim.
   
   Both suites use step-config.json to extract the relevant SKILL.md section 
live, so a future edit to the skill is automatically reflected in the prompt.
   
   Also updates tools/skill-evals/README.md: corrects the count from 15 → 19 
(three previously-unlisted suites — pr-management-mentor, pr-management-stats, 
pr-management-triage — are now listed) and adds the new list-steward-skills 
entry.
   
   Validation: uv run --project tools/skill-evals skill-eval 
tools/skill-evals/evals/list-steward-skills/ → all 7 cases load and print 
without error.
   
   Generated-by: Claude (Opus 4.7)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to