justinmclean commented on PR #158: URL: https://github.com/apache/airflow-steward/pull/158#issuecomment-4465343784
I updated and expanded on this for the cover two skills. runner.py now assembles the system prompt by extracting the relevant section directly from SKILL.md at run time (via a step-config.json pointer in each fixtures directory), rather than relying on a static copy. A change to the skill rules is immediately visible in the prompt — if it would cause the model to produce different output, the test fails. External tool calls (GitHub CLI, Gmail MCP, canned-response scan) are never executed during evals. Their outputs are pre-rendered as structured text in each case's report.md and injected into the user turn as mock data. This keeps inputs fully deterministic and requires no network access or API credentials to run. Removed the hardcoded SYSTEM_PROMPT fallback in runner.py. 65/65 cases pass (32 import, 33 triage). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
