This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git
The following commit(s) were added to refs/heads/main by this push:
new 52f65e1 fix(pr-management-triage): four classifier heuristic fixes
from a live session (#344)
52f65e1 is described below
commit 52f65e10420c5ed40c0ba34a235554016f86ea40
Author: Jarek Potiuk <[email protected]>
AuthorDate: Wed May 27 22:54:28 2026 +0200
fix(pr-management-triage): four classifier heuristic fixes from a live
session (#344)
* fix(pr-management-triage): match Copilot bots without requiring `[bot]`
suffix
The `copilot_review_stale` precondition (row 2 of the decision table)
listed `copilot-pull-request-reviewer[bot]`, `copilot[bot]`, etc. as
the matchable logins. GitHub's GraphQL `Actor.login` field returns
some of these without the `[bot]` suffix — most visibly for
`copilot-pull-request-reviewer` itself, which appears as
`copilot-pull-request-reviewer` (no suffix) on the PR's `author.login`
even though the underlying account is the same bot.
Strict `[bot]`-suffix matching excludes those threads from row 2.
Real-world impact seen during triage on `apache/airflow`: a PR with
two stale (55-day-old) Copilot review threads classified as
`deterministic_flag → comment` (only static-check failure surfaced)
instead of `stale_copilot_review → draft`. The maintainer had to
manually re-route the action.
Relax the matcher to "case-insensitive substring `copilot` in login;
`[bot]` suffix optional". The 7-day age threshold and the
no-author-reply qualifier are unchanged — only the login-string
match is widened to align with what GraphQL actually returns.
* fix(pr-management-triage): route mixed static-check failures to
`comment`, not `rerun`
Insert decision row 12b between rows 12 and 13. The original ordering
left a hole: row 12 fires only when *every* failed check is a static
check; if even one non-static failure was mixed in, row 12 missed and
row 13 (`failed_count <= 2 -> rerun`) caught the case and proposed a
rerun. Rerunning is a waste when at least one of the failures is a
static check that needs a code fix to pass.
Real-world impact seen during triage on `apache/airflow`: a PR with
`CI image checks / Static checks: FAILURE` plus
`provider distributions tests / Compat 3.0.6: FAILURE` (one static
+ one possibly-flaky compat test, total = 2 failures) classified as
`deterministic_flag -> rerun`. The maintainer overrode to `comment`
because the static-check failure will deterministically re-fail
without a code change.
Row 12b is intentionally placed *after* row 12 so the all-static
case still takes the more specific row (same action, different
reason template clarifies which case the maintainer is seeing).
* fix(pr-management-triage): widen F5b to scan the last 5 collaborator
comments
F5b previously inspected only the *most recent* collaborator comment
for an unanswered `@`-mention to another maintainer. That misses the
common case where:
1. Maintainer A pings maintainer B for an opinion ("@B what do
you think of this approach?") in comment #N
2. Maintainer C (often the viewer) posts a quality-criteria
comment in comment #N+1 because of an orthogonal issue (CI
red, merge conflicts, etc.)
The A->B ping is still outstanding — B has not replied — but F5b
only looks at comment #N+1, doesn't see the older mention, and the
PR slides into the decision table. The maintainer ends up auto-
proposing actions on a PR where two maintainers are mid-conversation.
Real-world impact seen during triage on `apache/airflow`: a PR
where one maintainer asked another for an opinion 2 days ago, then
the viewer drafted the PR yesterday for quality issues. The
classifier proposed `request-author-confirmation` today — talking
over the older unanswered ping.
Widen F5b to walk the last 5 collaborator comments, newest-first,
and fire on the first one matching the unanswered-`@`-mention
condition. The 5-comment window matches the bounded depth the
batch query already fetches; no additional GraphQL cost.
* feat(pr-management-triage): add row 0 to skip stale-abandoned first-timer
PRs
Decision row 1 fires `approve-workflow` for any first-time-contributor
PR with no real CI run, regardless of how long ago the maintainer
already gave the contributor feedback or how long the contributor has
been silent. The result: a PR that was triaged 8+ weeks ago, with no
new commits since, keeps surfacing every triage sweep and the
maintainer either has to re-approve CI on stalled code (which then
re-fails on the same unaddressed quality issues) or override to `skip`
manually.
Add row 0 + the `first_time_stale_abandoned` precondition. The row
fires *before* row 1 and routes to `skip` when all of these hold:
- author is FIRST_TIME_CONTRIBUTOR / FIRST_TIMER,
- a prior viewer triage marker exists,
- no commits have been pushed since the marker, and
- the PR's head commit is >= 30 days old.
The PR is then left to the stale-sweep flow (sweep 1b — "untriaged
draft >= 14 days" — or its successor), which is the appropriate
closure pathway.
The 30-day threshold is deliberately longer than the grace windows
(24h / 96h) and the F5a cooldown (72h). It captures *abandonment*,
not slow response — a contributor who replies within a week and
then stalls for another week is not abandoned, just busy.
Real-world impact seen during triage on `apache/airflow`: a
first-time-contributor PR triaged on 2026-04-02 with no push since
classified as `approve-workflow` 8 weeks later. The maintainer
skipped it manually; row 0 retires it automatically.
---
.../pr-management-triage/classify-and-act.md | 60 +++++++++++++++++++---
1 file changed, 53 insertions(+), 7 deletions(-)
diff --git a/.claude/skills/pr-management-triage/classify-and-act.md
b/.claude/skills/pr-management-triage/classify-and-act.md
index 7d5df4c..2c76e63 100644
--- a/.claude/skills/pr-management-triage/classify-and-act.md
+++ b/.claude/skills/pr-management-triage/classify-and-act.md
@@ -46,7 +46,7 @@ filter is skipped silently from the main triage flow.
| F3 | Draft and not stale | `isDraft == true` and any activity within the
last 14 days. Stale-sweep classifications in
[`stale-sweeps.md`](stale-sweeps.md) may still pull the PR back in. |
| F4 | Already marked ready, no regression | `labels` contains `ready for
maintainer review` AND CI green AND `mergeable != CONFLICTING` AND no
unresolved **collaborator** threads (same collaborator-author qualifier as rows
19/20 / [`unresolved_threads_only`](#unresolved_threads_only) —
contributor-author threads alone don't count as a regression for an
already-ready PR). **Regression bypasses this filter** — any of: CI red, new
conflict, or a new unresolved collaborator thread whose tri [...]
| F5a | Recent collaborator comment (author cooldown) | Most recent comment
from the **union of** general-issue comments (`comments(last:10)`) and
**review-thread comments** (`reviewThreads.nodes.comments`) is by a
`COLLABORATOR`/`MEMBER`/`OWNER`, `createdAt < 72h` ago, AND posted after
`commits(last:1).committedDate`. The review-thread leg is essential — a
maintainer asking a clarifying question in-thread is just as much an active
conversation as a top-level comment, and treating only t [...]
-| F5b | Maintainer-to-maintainer ping unanswered | Most recent collaborator
comment from the **union** described in F5a above `@`-mentions one or more
logins other than the PR author AND none of those mentioned logins have posted
on the PR (general comments **or** review threads) or in `latestReviews` after
that comment. Team mentions (e.g. `@<upstream>-committers`) are conservatively
treated as F5b matches. |
+| F5b | Maintainer-to-maintainer ping unanswered | **Walk the most recent 5
collaborator comments** from the **union** described in F5a above
(newest-first). For each, if it `@`-mentions one or more logins other than the
PR author AND none of those mentioned logins have posted on the PR (general
comments **or** review threads) or in `latestReviews` after that comment, F5b
fires. The 5-comment window catches the case where maintainer A pings
maintainer B, then a different maintainer C (of [...]
| F6 | Maintainer co-drafted | `isDraft == true` AND any of: (a)
`latestReviews` has a node with `authorAssociation ∈ {OWNER, MEMBER,
COLLABORATOR}` AND `author.login ≠ <viewer>` AND `state ∈ {COMMENTED,
CHANGES_REQUESTED, APPROVED}` AND `submittedAt > commits(last:1).committedDate`
AND review body is non-empty (avoids the "review with only inline thread
comments and an empty top-level body" false positive — those are already
counted by row 14/15 unresolved-thread logic); (b) `comments(l [...]
F5a, F5b, and F6 override every signal in the decision table —
@@ -72,6 +72,7 @@ Action verbs are defined in [`actions.md`](actions.md).
| # | Precondition (all must hold)
| Classification | Action |
Reason template |
|----|-----------------------------------------------------------------------------------------------|---------------------------|------------------------|-----------------|
+| 0 | [`first_time_stale_abandoned`](#first_time_stale_abandoned) —
first-time contributor PR with a prior viewer triage marker and no commits
since the marker, ≥ 30 days old | `first_time_stale_abandoned` | `skip` |
First-time contributor's PR was triaged ≥ 30d ago, no push since — let the
stale-sweep retire it rather than re-approving CI |
| 1 | `head_sha` appears in the per-page `action_required` REST index, OR
([`first_time_no_real_ci`](#first_time_no_real_ci)) |
`pending_workflow_approval` | `approve-workflow` | First-time contributor —
review the diff and approve CI, or flag suspicious |
| 2 | [`copilot_review_stale`](#copilot_review_stale)
| `stale_copilot_review` | `draft` (include the
specific Copilot thread URL in the violation body) | Unaddressed Copilot review
≥ 7 days old — convert to draft |
| 3 | Viewer comment containing the triage marker exists, posted after last
commit, age < 7 days, sub-state `waiting` | `already_triaged` | `skip`
| Already triaged M days ago — still waiting on author |
@@ -85,6 +86,7 @@ Action verbs are defined in [`actions.md`](actions.md).
| 10 | [`ci_failures_only`](#ci_failures_only) AND every failure ∈
`recent_main_failures` | `deterministic_flag` | `rerun`
| All N CI failures also appear in recent main-branch PRs — likely
systemic, suggest rerun |
| 11 | [`ci_failures_only`](#ci_failures_only) AND any failure ∈
`recent_main_failures` | `deterministic_flag` | `rerun`
| K/N CI failures match recent main-branch PRs — likely systemic |
| 12 | [`ci_failures_only`](#ci_failures_only) AND every failed check is a
[static check](#static_check) | `deterministic_flag` | `comment`
| Only static-check failures — needs a code fix, not a rerun |
+| 12b | [`ci_failures_only`](#ci_failures_only) AND **any** failed check is a
[static check](#static_check) (and not all — row 12 captured that case) |
`deterministic_flag` | `comment` | Mixed failures including a
static-check failure — code fix needed; a rerun would re-fail on the static
check |
| 13 | [`ci_failures_only`](#ci_failures_only) AND `failed_count <= 2` AND
`commits_behind <= 50` | `deterministic_flag` | `rerun`
| N CI failure(s) on otherwise clean PR — likely flaky, suggest rerun |
| 14a | [`author_confirmation_received`](#author_confirmation_received)
| `author_confirmed_ready` | `mark-ready` |
Author confirmed PR is ready for maintainer review — apply label |
| 14b | [`pending_author_confirmation`](#pending_author_confirmation)
| `awaiting_author_confirmation` | `skip` |
Awaiting author confirmation requested M days ago |
@@ -379,12 +381,22 @@ with the reply visible.
### `copilot_review_stale`
The PR has at least one unresolved review thread whose first
-comment author matches a Copilot bot login (case-insensitive:
-`copilot-pull-request-reviewer[bot]`, `copilot[bot]`,
-`github-copilot[bot]`, or any login matching `copilot*[bot]` /
-`github-copilot*[bot]`) AND that comment's `createdAt` is ≥ 7
-days ago AND no author reply in the same thread or on the PR
-after that timestamp.
+comment author matches a Copilot bot login (case-insensitive
+substring match on the login). The skill matches any of:
+`copilot-pull-request-reviewer`, `copilot`, `github-copilot`,
+or any login containing the substring `copilot` followed
+optionally by `[bot]`. The `[bot]` suffix is matched when
+present but NOT required — GitHub's GraphQL `Actor.login`
+returns the bot username without the `[bot]` suffix for some
+Copilot integrations (e.g. `copilot-pull-request-reviewer`),
+even though the same bot appears as `copilot-pull-request-reviewer[bot]`
+on the REST API. Requiring `[bot]` excludes real Copilot
+threads from this rule and lets them age past the 7-day
+threshold without triggering. The threshold itself is unchanged.
+
+The thread must also satisfy: that comment's `createdAt` is
+≥ 7 days ago AND no author reply in the same thread or on the
+PR after that timestamp.
### `static_check`
@@ -419,6 +431,40 @@ Count of PRs by the same author seen on the **current
page** that
already matched a `deterministic_flag` row. Per-page only — does
not persist across sessions.
+### `first_time_stale_abandoned`
+
+All of:
+
+- `authorAssociation == FIRST_TIME_CONTRIBUTOR` or
+ `FIRST_TIMER`.
+- A viewer triage marker exists in `comments(last:10)` (i.e.
+ a comment by the viewer containing the literal string
+ `Pull Request quality criteria`).
+- The PR's `commits(last:1).committedDate` is **at or before**
+ the triage marker's `createdAt` — author has not pushed
+ since the maintainer's feedback.
+- `<now> - committedDate >= 30 days`.
+
+Order matters: this precondition is evaluated by **row 0**,
+which fires *before* row 1 (`pending_workflow_approval`).
+Without it, an abandoned first-time PR that has sat for months
+since being triaged keeps surfacing in the
+`approve-workflow` group every sweep, where the maintainer
+either re-approves CI on dead code or has to skip it manually.
+
+A row-0 hit routes to `skip` — the [stale-sweep
+flow](stale-sweeps.md) is the right place to retire the PR
+(via sweep 1b's "untriaged draft >= 14 days" trigger, since
+the row-0 marker test does *not* set the draft state). The
+classifier's job here is just to keep the PR out of the
+workflow-approval group.
+
+The 30-day threshold is intentionally longer than the
+[grace periods](#grace-periods) (24h / 96h) and the F5a
+cooldown (72h). It captures *abandonment*, not slow response —
+a contributor who replies within a week and then stalls for
+another week is not abandoned, just busy.
+
### `first_time_no_real_ci`
Author is `FIRST_TIME_CONTRIBUTOR` or `FIRST_TIMER` AND one of: