justinmclean opened a new issue, #376:
URL: https://github.com/apache/airflow-steward/issues/376

   ## Summary
   
   Add a `--fail-fast` flag to `skill_evals.runner` that, in `--cli` mode,
   stops the loop as soon as a case is reported `FAIL` or `ERROR` instead
   of running the full suite.
   
   ## Background
   
   The runner already iterates cases in `--cli` mode and prints
   `PASS / FAIL / MANUAL / ERROR` per case, then exits non-zero at the end
   if anything failed (see
   [`tools/skill-evals/README.md` § Automated 
mode](tools/skill-evals/README.md)).
   On large suites a single bad case can take minutes to surface because
   the harness keeps running the rest. `--fail-fast` is the same affordance
   `pytest -x` and `prek` already offer, and the loop is already structured
   to support it (each case is reported then `continue`s).
   
   ## Where to look
   
   - `tools/skill-evals/src/skill_evals/runner.py` — the case loop in
     `main()` around lines 484–530, where `passed`, `failed`, `manual`,
     and `errored` are incremented. After each `failed` or `errored`
     increment, break out of the loop if `args.fail_fast` is set.
   - Same file, the `argparse` block around line 377 — add the flag next
     to `--verbose`. Document it as `--cli`-mode only.
   - `tools/skill-evals/tests/test_runner.py` — where a regression test
     goes.
   
   ## Acceptance criteria
   
   - [ ] `--fail-fast` is accepted by `argparse` and documented in `--help`.
   - [ ] In `--cli` mode with `--fail-fast`, the runner stops after the
         first case reported `FAIL` or `ERROR`; remaining cases are not
         executed.
   - [ ] Without `--fail-fast`, behaviour is unchanged (existing tests still
         pass).
   - [ ] The end-of-run summary line still prints, reflecting the partial
         counts.
   - [ ] A test in `tools/skill-evals/tests/test_runner.py` exercises the
         flag with a stub `--cli` command that fails on the first case
         and asserts later cases were not invoked.
   - [ ] `uv run --directory tools/skill-evals --group dev pytest` passes.
   
   ## Estimated effort
   
   ~2 hours for someone new to the codebase.
   
   ## Getting started
   
   - [How to contribute](CONTRIBUTING.md)
   - [Set up a local dev environment](CONTRIBUTING.md#getting-set-up)
   - [How to open a pull request](CONTRIBUTING.md#opening-a-pull-request)
   
   ---
   
   _This issue was drafted with the help of an AI-assisted tool and reviewed by 
an apache/airflow-steward maintainer before posting. If anything here is 
unclear or looks wrong, say so on the issue: a real person is reading._


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to