andreahlert commented on PR #454:
URL: https://github.com/apache/airflow-steward/pull/454#issuecomment-4634171153

   Really like where this is going, the conservative thresholds and the 
propose-then-confirm framing feel right, and the latest commit (`f707e36`) 
already addresses most of the earlier round. Thanks for the quick turnaround on 
that.
   
   A few things I noticed while reading through, mostly small. The first three 
are spots where the doc seems to disagree with itself, which I think would trip 
up whoever implements the signals later. The rest are more "worth a thought" 
than blockers.
   
   ### Small doc inconsistencies
   
   **1. The H5 example and its threshold don't quite line up.** H5 is "5 or 
more distinct top-level directories", but the example report fires H5 with four 
(`go-sdk/`, `airflow-core/ui/`, `docs/adr/`, `team_project/`). Might be worth 
bumping the example to five, or adjusting the threshold if four was the real 
intent.
   
   **2. H1 and H5 read as mutually exclusive, but the example fires both.** H1 
says the diff is one "where every path shares a single first-level directory 
prefix", which I read as "everything lives under one directory". H5 needs 
changes across five-plus directories, and the example fires both together. I 
suspect the intent for H1 is "the new-file subset forms a standalone project 
dir" rather than literally every path, so scoping H1 to that subset would clear 
it up.
   
   **3. The chip-vs-line wording is half updated.** `review-flow.md` Step 2.5 
now says "do not attempt to modify the already-displayed Step 1 headline" and 
emits a separate `⚠ [suspicious] — ...` line, which I think is the right call. 
`slop-detection.md` still has the older "surfaces in the headline" (Threshold) 
and "chip in the headline is the only signal" (False-positive) phrasing, so the 
prose lags the table a bit. Probably just needs the two sentences updated to 
match.
   
   ### Things worth a thought
   
   **4. The "no token budget" framing might be a touch optimistic.** The scan 
is described as purely structural, but H1 ("README suggest it is an independent 
project unrelated to the upstream codebase") and H5 ("no discernible semantic 
relationship") do lean on the model making a judgment call. That lines up with 
the "good candidates for deterministic checks" comment above, they're just not 
deterministic yet. Maybe soften the claim slightly, or note which signals are 
genuinely structural vs. which need a read.
   
   **5. H2 could catch legitimate cross-repo links.** It's titled "private-fork 
issue URL", but the check is really "repo-name differs from upstream". A body 
referencing `https://github.com/apache/airflow/issues/123` from here would 
match, even though that's a normal sibling-repo link. Comparing the URL's owner 
against the PR author, or scoping to same-named forks, would tighten it.
   
   **6. S3 ("no real CI") may fire on the people we don't want to 
over-correct.** GitHub holds workflows pending approval for first-time 
contributors, so a genuine newcomer PR often shows zero CI runs simply because 
nobody approved the run yet. Might be worth distinguishing "CI pending 
approval" from "only bots ran" so newcomers aren't penalized for the gate.
   
   **7. The `[C]` template footer is a slightly different string from the 
canonical one.** Golden rule 5 / `posting.md` define `<ai_attribution_footer>`; 
the `[C]` template uses its own "*Drafted by an AI-assisted tool...*" wording. 
Reusing the shared block would keep the two from drifting apart later.
   
   **8. H4 might miss some of its target cases.** It reads 
`commits[].authors[].login`, which comes back empty when a commit email isn't 
linked to a GitHub account, fairly common for student contributors on local git 
identities, which is exactly the population H4 is aimed at. Falling back to 
name/email when login is empty would make it more robust.
   
   ### Process
   
   **9. Eval coverage.** I saw the plan to add tests separately, which works. 
Just flagging that the repo leans on skill evals 
(`tools/skill-evals/evals/pr-management-code-review/` already covers steps 3, 
4, 6), so it'd be nice to land fixtures close behind this. Three cases would 
probably do it: a clear-slop early exit, a 1-hard-2-soft note-only, and a large 
legitimate refactor that shouldn't false-positive.
   
   **10. close+lock now lives in two skills.** The `[X]` carve-out reads as a 
deliberate, reasonable choice. Only note is that close+lock now exists here and 
in `pr-management-triage`, so a small reminder to keep the two in sync (lock 
reason, template) if either changes down the line.
   
   Overall this is a solid addition, none of the above is a hard objection. 
Happy to help with the eval fixtures if useful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to