justinmclean opened a new pull request, #338:
URL: https://github.com/apache/airflow-steward/pull/338

   ## Summary
   
   Add optional tag support to the skill eval runner and mark a conservative 
subset of fixtures as runnable with local Ollama / llama smoke testing.
   
   The intended local command is:
   
   ```bash
   uv run --project tools/skill-evals skill-eval \
     --tag llama \
     --cli "ollama run llama3.1:8b --nowordwrap --format json" \
     tools/skill-evals/evals/
   ```
   This is not intended to replace the main eval model. The tagged cases are 
limited to simple smoke coverage where llama3.1:8b has been observed to behave 
reliably enough.
   
   Changes
   Add --tag filtering to skill-eval
   Add optional per-fixture case-meta.json tag metadata
   Document tagged eval usage in the skill eval README
   Add runner tests for tag loading and filtering
   Mark a curated llama / smoke fixture subset
   
   ## Type of change
   - [ ] Skill change (`.claude/skills/<name>/`) — eval fixtures updated below
   - [X] Tool / bridge contract (`tools/<system>/*.md`)
   - [ ] Python package (`tools/*/` with `pyproject.toml`)
   - [ ] Groovy reference impl
   - [ ] Cross-cutting (RFC, AGENTS.md, sandbox, privacy-LLM)
   - [ ] Documentation (`docs/`, `README.md`, `CONTRIBUTING.md`)
   - [ ] Project template (`projects/_template/`)
   - [X] CI / dev loop (`prek`, workflows, validators)
   - [ ] Other:
   
   ## Test plan
   
   - [X] `prek run --all-files` passes
   - [X] For Python packages touched: `uv run pytest` / `ruff check` / `mypy` 
passes
   - [ ] For Groovy bridges touched: command-line invocation tested end-to-end
   - [ ] For skill changes: eval suite passes for the affected skill
         (`PYTHONPATH=tools/skill-evals/src python3 -m skill_evals.runner 
tools/skill-evals/evals/<skill>/`)
   - [ ] For skill *behaviour* changes: a new or updated eval fixture is 
included in this PR
         (a regression test for the bug fixed / the behaviour added — see 
CONTRIBUTING.md)
   - [ ] Other:
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to