This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git


The following commit(s) were added to refs/heads/main by this push:
     new 4e1e203  feat(mentoring): advance Mentoring mode to experimental and 
add intervention-selection eval suite (#272)
4e1e203 is described below

commit 4e1e20307ec25b6f786f8ad92d465a768302a2ce
Author: Justin Mclean <[email protected]>
AuthorDate: Wed May 27 07:13:31 2026 +1000

    feat(mentoring): advance Mentoring mode to experimental and add 
intervention-selection eval suite (#272)
    
    * feat(mentoring): add intervention-selection eval suite; mark mode 
experimental
    
    Adds the missing `intervention` eval suite (8 cases) to the
    `pr-management-mentor` eval tree, covering steps 3–5 of the runtime
    loop: out-of-scope check, maintainer-engaged check, and trigger
    matching for all four templates plus the multi-trigger and no-trigger
    paths.
    
    Updates `docs/modes.md` to reflect the prototype skill that already
    shipped: Mentoring row moves from `proposed / 0 skills` to
    `experimental / 1 skill`, and the section body is rewritten to point
    at the live skill rather than the "lands in a follow-up PR" forward
    reference.
    
    Validation:
      test -f docs/mentoring/spec.md                          ✓
      uv run --project tools/skill-validator skill-validate   ✓ (no violations)
    
    Generated-by: Claude (Opus 4.7)
    
    * fix bug
    
    * feat(eval): add intervention case for out-of-scope deprecation-removal 
request
    
    Signed-off-by: Justin McLean <[email protected]>
    
    ---------
    
    Signed-off-by: Justin McLean <[email protected]>
---
 .../evals/pr-management-mentor/README.md           |  4 ++--
 .../case-9-deprecation-decision/expected.json      |  5 +++++
 .../fixtures/case-9-deprecation-decision/report.md | 26 ++++++++++++++++++++++
 3 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/tools/skill-evals/evals/pr-management-mentor/README.md 
b/tools/skill-evals/evals/pr-management-mentor/README.md
index f63571c..d2ae9b3 100644
--- a/tools/skill-evals/evals/pr-management-mentor/README.md
+++ b/tools/skill-evals/evals/pr-management-mentor/README.md
@@ -2,11 +2,11 @@
 
 Behavioral evals for the `pr-management-mentor` skill.
 
-## Suites (28 cases total)
+## Suites (29 cases total)
 
 | Suite | Step | Cases | What it covers |
 |---|---|---|---|
-| intervention | Intervention selection (steps 3–5 of the runtime loop) | 8 | 
Template 1 (missing repro); template 2 (missing version); template 3 
(convention gap); template 4 (why-pushback → hand-off); multiple triggers 
simultaneously (ask); maintainer already engaged (silent); no trigger fires 
(silent); out-of-scope topic (hand-off) |
+| intervention | Intervention selection (steps 3–5 of the runtime loop) | 9 | 
Template 1 (missing repro); template 2 (missing version); template 3 
(convention gap); template 4 (why-pushback → hand-off); multiple triggers 
simultaneously (ask); maintainer already engaged (silent); no trigger fires 
(silent); out-of-scope topic (hand-off); out-of-scope deprecation/removal 
decision carrying draftable bug signals (hand-off still wins) |
 | tone-checks | Pre-post checklist | 15 | Clean pass; hard-fail rules 1 
(praise), 2 (restating), 3 (AI self-ref), 4 (speaking for maintainer), 5 
(hedging), 6 (multiple asks), 7 (missing footer), 8 (author not tagged), 9 
(quoted doc), 10 (review prediction); soft-fail rules 11 (meta first line), 12 
(too long), 13 (jargon without link), 14 (exclamation in body) |
 | hand-off | Hand-off triggers | 5 | No trigger; trigger 1 (max turns 
reached); trigger 2 (contributor pushback on why-answer); trigger 3 
(out-of-scope topic); trigger 4 (contributor asks for human — highest priority) 
|
 
diff --git 
a/tools/skill-evals/evals/pr-management-mentor/intervention/fixtures/case-9-deprecation-decision/expected.json
 
b/tools/skill-evals/evals/pr-management-mentor/intervention/fixtures/case-9-deprecation-decision/expected.json
new file mode 100644
index 0000000..3af93d1
--- /dev/null
+++ 
b/tools/skill-evals/evals/pr-management-mentor/intervention/fixtures/case-9-deprecation-decision/expected.json
@@ -0,0 +1,5 @@
+{
+  "action": "handoff",
+  "template": null,
+  "reason": "The contributor's latest message asks the maintainers to decide 
whether to remove the deprecated schedule_interval parameter in a future 
release — an out-of-scope deprecation/removal decision (hand-off trigger 3) — 
so the mentor hands off rather than drafting about the log-spam symptom."
+}
diff --git 
a/tools/skill-evals/evals/pr-management-mentor/intervention/fixtures/case-9-deprecation-decision/report.md
 
b/tools/skill-evals/evals/pr-management-mentor/intervention/fixtures/case-9-deprecation-decision/report.md
new file mode 100644
index 0000000..c4384ec
--- /dev/null
+++ 
b/tools/skill-evals/evals/pr-management-mentor/intervention/fixtures/case-9-deprecation-decision/report.md
@@ -0,0 +1,26 @@
+Thread: Issue #41237 — "DeprecationWarning spam for schedule_interval after 
2.9→2.10 upgrade — can we just drop it?"
+MaxAgentTurns: 2
+AgentCommentCount: 0
+OutOfScopeTopics: [security, CVE, deprecation, licensing, architecture]
+
+Messages (chronological):
+  1. contributor (role: contributor, login: dana-r): "Just bumped our cluster
+     from 2.9.3 to 2.10.2 and now every scheduler loop dumps a wall of
+     `RemovedInAirflow3Warning: Param 'schedule_interval' is deprecated, use
+     'schedule' instead`. We have ~400 DAGs so it's thousands of lines a
+     minute. A typical DAG looks like:
+
+     ```python
+     with DAG('etl_daily', schedule_interval='@daily') as dag:
+         ...
+     ```
+
+     It's not breaking anything, just drowning the logs. Honestly, since it's
+     already deprecated — can we just remove schedule_interval outright in the
+     next minor release instead of warning forever? It causes more confusion
+     than it's worth."
+  2. contributor (role: contributor, login: dana-r): "Happy to put up a PR to
+     rip it out across the providers if the maintainers are on board."
+
+MaintainerLogins: [committer-a, committer-b]
+RecentMaintainerCommentCount: 0

Reply via email to