Re: [PR] An idea for testing fixture-based eval harness for skill steps [DRAFT] [airflow-steward]

via GitHub Fri, 15 May 2026 07:13:39 -0700


potiuk commented on PR #158:
URL: https://github.com/apache/airflow-steward/pull/158#issuecomment-4460514393


   Really cool . I love how clean and nice it is.. I would probably use Yaml 
not json ;) (but this is just my eastethics looking at all this extra {} :) . 
   
   > For (1), I think tools/skill-evals/evals/ looks like the better location. 
Putting it under the agent path does look cleaner structurally, but I worry 
there’s a risk of eval fixtures accidentally mixing into agent context at some 
point, so keeping them separated seems safer.
   
   Agree.
   
   > For (2), I think manual paste-and-compare is sufficient for now. If we 
eventually automate model execution, I’d prefer a CLI-style approach (e.g. 
claude -p) over direct API integration.
   
   I think it's good to start. We could even add a SKILL to run all such evals 
locally (asking the agent to essentially copy & paste) things - and have us to 
do periodic re-run with our own agents (We could even have this skill written 
in the way that it would setup some reminders to do it from time to time) . I 
think that should do for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] An idea for testing fixture-based eval harness for skill steps [DRAFT] [airflow-steward]

Reply via email to