(airflow-steward) branch main updated: fix(pr-management-triage): four classifier heuristic fixes from a live session (#344)

potiuk Wed, 27 May 2026 13:54:43 -0700

This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git



The following commit(s) were added to refs/heads/main by this push:
     new 52f65e1  fix(pr-management-triage): four classifier heuristic fixes 
from a live session (#344)
52f65e1 is described below

commit 52f65e10420c5ed40c0ba34a235554016f86ea40
Author: Jarek Potiuk <[email protected]>
AuthorDate: Wed May 27 22:54:28 2026 +0200

    fix(pr-management-triage): four classifier heuristic fixes from a live 
session (#344)
    
    * fix(pr-management-triage): match Copilot bots without requiring `[bot]` 
suffix
    
    The `copilot_review_stale` precondition (row 2 of the decision table)
    listed `copilot-pull-request-reviewer[bot]`, `copilot[bot]`, etc. as
    the matchable logins. GitHub's GraphQL `Actor.login` field returns
    some of these without the `[bot]` suffix — most visibly for
    `copilot-pull-request-reviewer` itself, which appears as
    `copilot-pull-request-reviewer` (no suffix) on the PR's `author.login`
    even though the underlying account is the same bot.
    
    Strict `[bot]`-suffix matching excludes those threads from row 2.
    Real-world impact seen during triage on `apache/airflow`: a PR with
    two stale (55-day-old) Copilot review threads classified as
    `deterministic_flag → comment` (only static-check failure surfaced)
    instead of `stale_copilot_review → draft`. The maintainer had to
    manually re-route the action.
    
    Relax the matcher to "case-insensitive substring `copilot` in login;
    `[bot]` suffix optional". The 7-day age threshold and the
    no-author-reply qualifier are unchanged — only the login-string
    match is widened to align with what GraphQL actually returns.
    
    * fix(pr-management-triage): route mixed static-check failures to 
`comment`, not `rerun`
    
    Insert decision row 12b between rows 12 and 13. The original ordering
    left a hole: row 12 fires only when *every* failed check is a static
    check; if even one non-static failure was mixed in, row 12 missed and
    row 13 (`failed_count <= 2 -> rerun`) caught the case and proposed a
    rerun. Rerunning is a waste when at least one of the failures is a
    static check that needs a code fix to pass.
    
    Real-world impact seen during triage on `apache/airflow`: a PR with
    `CI image checks / Static checks: FAILURE` plus
    `provider distributions tests / Compat 3.0.6: FAILURE` (one static
    + one possibly-flaky compat test, total = 2 failures) classified as
    `deterministic_flag -> rerun`. The maintainer overrode to `comment`
    because the static-check failure will deterministically re-fail
    without a code change.
    
    Row 12b is intentionally placed *after* row 12 so the all-static
    case still takes the more specific row (same action, different
    reason template clarifies which case the maintainer is seeing).
    
    * fix(pr-management-triage): widen F5b to scan the last 5 collaborator 
comments
    
    F5b previously inspected only the *most recent* collaborator comment
    for an unanswered `@`-mention to another maintainer. That misses the
    common case where:
    
      1. Maintainer A pings maintainer B for an opinion ("@B what do
         you think of this approach?") in comment #N
      2. Maintainer C (often the viewer) posts a quality-criteria
         comment in comment #N+1 because of an orthogonal issue (CI
         red, merge conflicts, etc.)
    
    The A->B ping is still outstanding — B has not replied — but F5b
    only looks at comment #N+1, doesn't see the older mention, and the
    PR slides into the decision table. The maintainer ends up auto-
    proposing actions on a PR where two maintainers are mid-conversation.
    
    Real-world impact seen during triage on `apache/airflow`: a PR
    where one maintainer asked another for an opinion 2 days ago, then
    the viewer drafted the PR yesterday for quality issues. The
    classifier proposed `request-author-confirmation` today — talking
    over the older unanswered ping.
    
    Widen F5b to walk the last 5 collaborator comments, newest-first,
    and fire on the first one matching the unanswered-`@`-mention
    condition. The 5-comment window matches the bounded depth the
    batch query already fetches; no additional GraphQL cost.
    
    * feat(pr-management-triage): add row 0 to skip stale-abandoned first-timer 
PRs
    
    Decision row 1 fires `approve-workflow` for any first-time-contributor
    PR with no real CI run, regardless of how long ago the maintainer
    already gave the contributor feedback or how long the contributor has
    been silent. The result: a PR that was triaged 8+ weeks ago, with no
    new commits since, keeps surfacing every triage sweep and the
    maintainer either has to re-approve CI on stalled code (which then
    re-fails on the same unaddressed quality issues) or override to `skip`
    manually.
    
    Add row 0 + the `first_time_stale_abandoned` precondition. The row
    fires *before* row 1 and routes to `skip` when all of these hold:
    
      - author is FIRST_TIME_CONTRIBUTOR / FIRST_TIMER,
      - a prior viewer triage marker exists,
      - no commits have been pushed since the marker, and
      - the PR's head commit is >= 30 days old.
    
    The PR is then left to the stale-sweep flow (sweep 1b — "untriaged
    draft >= 14 days" — or its successor), which is the appropriate
    closure pathway.
    
    The 30-day threshold is deliberately longer than the grace windows
    (24h / 96h) and the F5a cooldown (72h). It captures *abandonment*,
    not slow response — a contributor who replies within a week and
    then stalls for another week is not abandoned, just busy.
    
    Real-world impact seen during triage on `apache/airflow`: a
    first-time-contributor PR triaged on 2026-04-02 with no push since
    classified as `approve-workflow` 8 weeks later. The maintainer
    skipped it manually; row 0 retires it automatically.
---
 .../pr-management-triage/classify-and-act.md       | 60 +++++++++++++++++++---
 1 file changed, 53 insertions(+), 7 deletions(-)

diff --git a/.claude/skills/pr-management-triage/classify-and-act.md 
b/.claude/skills/pr-management-triage/classify-and-act.md
index 7d5df4c..2c76e63 100644
--- a/.claude/skills/pr-management-triage/classify-and-act.md
+++ b/.claude/skills/pr-management-triage/classify-and-act.md
@@ -46,7 +46,7 @@ filter is skipped silently from the main triage flow.
 | F3 | Draft and not stale | `isDraft == true` and any activity within the 
last 14 days. Stale-sweep classifications in 
[`stale-sweeps.md`](stale-sweeps.md) may still pull the PR back in. |
 | F4 | Already marked ready, no regression | `labels` contains `ready for 
maintainer review` AND CI green AND `mergeable != CONFLICTING` AND no 
unresolved **collaborator** threads (same collaborator-author qualifier as rows 
19/20 / [`unresolved_threads_only`](#unresolved_threads_only) — 
contributor-author threads alone don't count as a regression for an 
already-ready PR). **Regression bypasses this filter** — any of: CI red, new 
conflict, or a new unresolved collaborator thread whose tri [...]
 | F5a | Recent collaborator comment (author cooldown) | Most recent comment 
from the **union of** general-issue comments (`comments(last:10)`) and 
**review-thread comments** (`reviewThreads.nodes.comments`) is by a 
`COLLABORATOR`/`MEMBER`/`OWNER`, `createdAt < 72h` ago, AND posted after 
`commits(last:1).committedDate`. The review-thread leg is essential — a 
maintainer asking a clarifying question in-thread is just as much an active 
conversation as a top-level comment, and treating only t [...]
-| F5b | Maintainer-to-maintainer ping unanswered | Most recent collaborator 
comment from the **union** described in F5a above `@`-mentions one or more 
logins other than the PR author AND none of those mentioned logins have posted 
on the PR (general comments **or** review threads) or in `latestReviews` after 
that comment. Team mentions (e.g. `@<upstream>-committers`) are conservatively 
treated as F5b matches. |
+| F5b | Maintainer-to-maintainer ping unanswered | **Walk the most recent 5 
collaborator comments** from the **union** described in F5a above 
(newest-first). For each, if it `@`-mentions one or more logins other than the 
PR author AND none of those mentioned logins have posted on the PR (general 
comments **or** review threads) or in `latestReviews` after that comment, F5b 
fires. The 5-comment window catches the case where maintainer A pings 
maintainer B, then a different maintainer C (of [...]
 | F6 | Maintainer co-drafted | `isDraft == true` AND any of: (a) 
`latestReviews` has a node with `authorAssociation ∈ {OWNER, MEMBER, 
COLLABORATOR}` AND `author.login ≠ <viewer>` AND `state ∈ {COMMENTED, 
CHANGES_REQUESTED, APPROVED}` AND `submittedAt > commits(last:1).committedDate` 
AND review body is non-empty (avoids the "review with only inline thread 
comments and an empty top-level body" false positive — those are already 
counted by row 14/15 unresolved-thread logic); (b) `comments(l [...]
 
 F5a, F5b, and F6 override every signal in the decision table —
@@ -72,6 +72,7 @@ Action verbs are defined in [`actions.md`](actions.md).
 
 | #  | Precondition (all must hold)                                            
                      | Classification             | Action                  | 
Reason template |
 
|----|-----------------------------------------------------------------------------------------------|---------------------------|------------------------|-----------------|
+| 0  | [`first_time_stale_abandoned`](#first_time_stale_abandoned) — 
first-time contributor PR with a prior viewer triage marker and no commits 
since the marker, ≥ 30 days old | `first_time_stale_abandoned` | `skip` | 
First-time contributor's PR was triaged ≥ 30d ago, no push since — let the 
stale-sweep retire it rather than re-approving CI |
 | 1  | `head_sha` appears in the per-page `action_required` REST index, OR 
([`first_time_no_real_ci`](#first_time_no_real_ci))     | 
`pending_workflow_approval` | `approve-workflow`     | First-time contributor — 
review the diff and approve CI, or flag suspicious |
 | 2  | [`copilot_review_stale`](#copilot_review_stale)                         
                      | `stale_copilot_review`     | `draft` (include the 
specific Copilot thread URL in the violation body) | Unaddressed Copilot review 
≥ 7 days old — convert to draft |
 | 3  | Viewer comment containing the triage marker exists, posted after last 
commit, age < 7 days, sub-state `waiting` | `already_triaged`         | `skip`  
               | Already triaged M days ago — still waiting on author |
@@ -85,6 +86,7 @@ Action verbs are defined in [`actions.md`](actions.md).
 | 10 | [`ci_failures_only`](#ci_failures_only) AND every failure ∈ 
`recent_main_failures`             | `deterministic_flag`       | `rerun`       
         | All N CI failures also appear in recent main-branch PRs — likely 
systemic, suggest rerun |
 | 11 | [`ci_failures_only`](#ci_failures_only) AND any failure ∈ 
`recent_main_failures`               | `deterministic_flag`       | `rerun`     
           | K/N CI failures match recent main-branch PRs — likely systemic |
 | 12 | [`ci_failures_only`](#ci_failures_only) AND every failed check is a 
[static check](#static_check) | `deterministic_flag`     | `comment`            
  | Only static-check failures — needs a code fix, not a rerun |
+| 12b | [`ci_failures_only`](#ci_failures_only) AND **any** failed check is a 
[static check](#static_check) (and not all — row 12 captured that case) | 
`deterministic_flag`     | `comment`              | Mixed failures including a 
static-check failure — code fix needed; a rerun would re-fail on the static 
check |
 | 13 | [`ci_failures_only`](#ci_failures_only) AND `failed_count <= 2` AND 
`commits_behind <= 50`     | `deterministic_flag`       | `rerun`               
 | N CI failure(s) on otherwise clean PR — likely flaky, suggest rerun |
 | 14a | [`author_confirmation_received`](#author_confirmation_received)        
                        | `author_confirmed_ready`   | `mark-ready`           | 
Author confirmed PR is ready for maintainer review — apply label |
 | 14b | [`pending_author_confirmation`](#pending_author_confirmation)          
                        | `awaiting_author_confirmation` | `skip`             | 
Awaiting author confirmation requested M days ago |
@@ -379,12 +381,22 @@ with the reply visible.
 ### `copilot_review_stale`
 
 The PR has at least one unresolved review thread whose first
-comment author matches a Copilot bot login (case-insensitive:
-`copilot-pull-request-reviewer[bot]`, `copilot[bot]`,
-`github-copilot[bot]`, or any login matching `copilot*[bot]` /
-`github-copilot*[bot]`) AND that comment's `createdAt` is ≥ 7
-days ago AND no author reply in the same thread or on the PR
-after that timestamp.
+comment author matches a Copilot bot login (case-insensitive
+substring match on the login). The skill matches any of:
+`copilot-pull-request-reviewer`, `copilot`, `github-copilot`,
+or any login containing the substring `copilot` followed
+optionally by `[bot]`. The `[bot]` suffix is matched when
+present but NOT required — GitHub's GraphQL `Actor.login`
+returns the bot username without the `[bot]` suffix for some
+Copilot integrations (e.g. `copilot-pull-request-reviewer`),
+even though the same bot appears as `copilot-pull-request-reviewer[bot]`
+on the REST API. Requiring `[bot]` excludes real Copilot
+threads from this rule and lets them age past the 7-day
+threshold without triggering. The threshold itself is unchanged.
+
+The thread must also satisfy: that comment's `createdAt` is
+≥ 7 days ago AND no author reply in the same thread or on the
+PR after that timestamp.
 
 ### `static_check`
 
@@ -419,6 +431,40 @@ Count of PRs by the same author seen on the **current 
page** that
 already matched a `deterministic_flag` row. Per-page only — does
 not persist across sessions.
 
+### `first_time_stale_abandoned`
+
+All of:
+
+- `authorAssociation == FIRST_TIME_CONTRIBUTOR` or
+  `FIRST_TIMER`.
+- A viewer triage marker exists in `comments(last:10)` (i.e.
+  a comment by the viewer containing the literal string
+  `Pull Request quality criteria`).
+- The PR's `commits(last:1).committedDate` is **at or before**
+  the triage marker's `createdAt` — author has not pushed
+  since the maintainer's feedback.
+- `<now> - committedDate >= 30 days`.
+
+Order matters: this precondition is evaluated by **row 0**,
+which fires *before* row 1 (`pending_workflow_approval`).
+Without it, an abandoned first-time PR that has sat for months
+since being triaged keeps surfacing in the
+`approve-workflow` group every sweep, where the maintainer
+either re-approves CI on dead code or has to skip it manually.
+
+A row-0 hit routes to `skip` — the [stale-sweep
+flow](stale-sweeps.md) is the right place to retire the PR
+(via sweep 1b's "untriaged draft >= 14 days" trigger, since
+the row-0 marker test does *not* set the draft state). The
+classifier's job here is just to keep the PR out of the
+workflow-approval group.
+
+The 30-day threshold is intentionally longer than the
+[grace periods](#grace-periods) (24h / 96h) and the F5a
+cooldown (72h). It captures *abandonment*, not slow response —
+a contributor who replies within a week and then stalls for
+another week is not abandoned, just busy.
+
 ### `first_time_no_real_ci`
 
 Author is `FIRST_TIME_CONTRIBUTOR` or `FIRST_TIMER` AND one of:

(airflow-steward) branch main updated: fix(pr-management-triage): four classifier heuristic fixes from a live session (#344)

Reply via email to