andreahlert commented on PR #454:
URL: https://github.com/apache/airflow-steward/pull/454#issuecomment-4634171153
Really like where this is going, the conservative thresholds and the
propose-then-confirm framing feel right, and the latest commit (`f707e36`)
already addresses most of the earlier round. Thanks for the quick turnaround on
that.
A few things I noticed while reading through, mostly small. The first three
are spots where the doc seems to disagree with itself, which I think would trip
up whoever implements the signals later. The rest are more "worth a thought"
than blockers.
### Small doc inconsistencies
**1. The H5 example and its threshold don't quite line up.** H5 is "5 or
more distinct top-level directories", but the example report fires H5 with four
(`go-sdk/`, `airflow-core/ui/`, `docs/adr/`, `team_project/`). Might be worth
bumping the example to five, or adjusting the threshold if four was the real
intent.
**2. H1 and H5 read as mutually exclusive, but the example fires both.** H1
says the diff is one "where every path shares a single first-level directory
prefix", which I read as "everything lives under one directory". H5 needs
changes across five-plus directories, and the example fires both together. I
suspect the intent for H1 is "the new-file subset forms a standalone project
dir" rather than literally every path, so scoping H1 to that subset would clear
it up.
**3. The chip-vs-line wording is half updated.** `review-flow.md` Step 2.5
now says "do not attempt to modify the already-displayed Step 1 headline" and
emits a separate `⚠ [suspicious] — ...` line, which I think is the right call.
`slop-detection.md` still has the older "surfaces in the headline" (Threshold)
and "chip in the headline is the only signal" (False-positive) phrasing, so the
prose lags the table a bit. Probably just needs the two sentences updated to
match.
### Things worth a thought
**4. The "no token budget" framing might be a touch optimistic.** The scan
is described as purely structural, but H1 ("README suggest it is an independent
project unrelated to the upstream codebase") and H5 ("no discernible semantic
relationship") do lean on the model making a judgment call. That lines up with
the "good candidates for deterministic checks" comment above, they're just not
deterministic yet. Maybe soften the claim slightly, or note which signals are
genuinely structural vs. which need a read.
**5. H2 could catch legitimate cross-repo links.** It's titled "private-fork
issue URL", but the check is really "repo-name differs from upstream". A body
referencing `https://github.com/apache/airflow/issues/123` from here would
match, even though that's a normal sibling-repo link. Comparing the URL's owner
against the PR author, or scoping to same-named forks, would tighten it.
**6. S3 ("no real CI") may fire on the people we don't want to
over-correct.** GitHub holds workflows pending approval for first-time
contributors, so a genuine newcomer PR often shows zero CI runs simply because
nobody approved the run yet. Might be worth distinguishing "CI pending
approval" from "only bots ran" so newcomers aren't penalized for the gate.
**7. The `[C]` template footer is a slightly different string from the
canonical one.** Golden rule 5 / `posting.md` define `<ai_attribution_footer>`;
the `[C]` template uses its own "*Drafted by an AI-assisted tool...*" wording.
Reusing the shared block would keep the two from drifting apart later.
**8. H4 might miss some of its target cases.** It reads
`commits[].authors[].login`, which comes back empty when a commit email isn't
linked to a GitHub account, fairly common for student contributors on local git
identities, which is exactly the population H4 is aimed at. Falling back to
name/email when login is empty would make it more robust.
### Process
**9. Eval coverage.** I saw the plan to add tests separately, which works.
Just flagging that the repo leans on skill evals
(`tools/skill-evals/evals/pr-management-code-review/` already covers steps 3,
4, 6), so it'd be nice to land fixtures close behind this. Three cases would
probably do it: a clear-slop early exit, a 1-hard-2-soft note-only, and a large
legitimate refactor that shouldn't false-positive.
**10. close+lock now lives in two skills.** The `[X]` carve-out reads as a
deliberate, reasonable choice. Only note is that close+lock now exists here and
in `pr-management-triage`, so a small reminder to keep the two in sync (lock
reason, template) if either changes down the line.
Overall this is a solid addition, none of the above is a hard objection.
Happy to help with the eval fixtures if useful.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]