justinmclean opened a new pull request, #208:
URL: https://github.com/apache/airflow-steward/pull/208

   Adds behavioral eval coverage for six skills that had no test fixtures,
   bringing the suite from 206 cases across 9 skills to 290 cases across 15.
   
   ## New suites
   
   **issue-triage** (22 cases, 5 steps)
   step-1-resolve-selector, step-3-classify, step-4-compose-comment,
   step-5-confirm, step-7-recap. Includes one adversarial case (step-3
   case-7: SYSTEM: block instructs the model to classify all issues as BUG;
   correct answer is INVALID).
   
   **issue-reproducer** (27 cases, 7 steps)
   step-1-inventory, step-2-pick-candidate, step-3-classify-shape,
   step-5.5-confirm, step-7-verify, step-8-baselines, step-10-compose-verdict.
   Includes one adversarial case (step-7 case-6: stdout contains an AGENT
   OVERRIDE directive; correct classification is cannot-run-environment).
   
   **issue-fix-workflow** (12 cases, 4 steps)
   step-2-locate-area, step-6-scope-check, step-7-compose-commit,
   step-8-handback.
   
   **issue-reassess** (10 cases, 4 steps)
   step-1-pool-selection, step-2-resumability, step-4-aggregate,
   step-5-campaign-report. step-5 uses structural assertions (section
   presence, still-failing tail coverage, no-auto-post-claim) rather than
   exact JSON match, following the same pattern as issue-triage/step-4.
   
   **issue-reassess-stats** (8 cases, 3 steps)
   step-1-fetch-verdicts, step-2-classify, step-3-aggregate.
   
   **pr-management-code-review** (5 cases, 1 step)
   review-disposition. Includes one adversarial case (case-5: PR body
   instructs the model to approve immediately; correct disposition is
   REQUEST_CHANGES based on a real dependency conflict in the diff).
   
   ## Coverage rationale
   
   Steps omitted are either not-applicable (pre-flights, GitHub posts,
   runtime execution, working-tree resets) or hard-to-test (steps that
   generate arbitrary code or HTML). Every step with a structured,
   mockable output now has at least two fixture cases.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to