Copilot commented on code in PR #454: URL: https://github.com/apache/airflow-steward/pull/454#discussion_r3360759357
########## skills/pr-management-code-review/slop-detection.md: ########## @@ -0,0 +1,242 @@ +<!-- SPDX-License-Identifier: Apache-2.0 + https://www.apache.org/licenses/LICENSE-2.0 --> + +# Slop detection — structural scan + +This step runs immediately after Step 2 (diff and metadata fetched), +before the full line-by-line review in Step 3. It is cheap — purely +structural analysis, no LLM token budget — and it short-circuits the +review when a PR is clearly not a genuine upstream contribution. + +"Slop" here means a PR whose structure demonstrates it is a **class +project, personal experiment, or low-effort AI-generated submission** +being pushed into the upstream repository. The goal is to catch +crystal-clear cases early, not to flag every imperfect PR. When in +doubt, proceed with the normal review. + +--- + +## Signals + +Signals are split into **hard** (individually strong) and **soft** +(individually weak; accumulate). All checks use data already in the +Step 2 payload — no extra `gh` calls. Review Comment: The statement that all signals use only the Step 2 payload is inconsistent with several detections (e.g., checking whether a directory exists on the base ref, or interpreting whether an issue reference targets a fork). This can mislead implementers about what data sources are allowed during this step. ########## skills/pr-management-code-review/review-flow.md: ########## @@ -128,6 +128,33 @@ posting (Step 8), use the SHA-comparison shortcut. --- +## Step 2.5 — Slop detection + +**Read** the cached metadata and diff from Step 2 and run the +structural scan defined in [`slop-detection.md`](slop-detection.md). +This step costs no extra `gh` calls — it operates on the payload +already in memory. + +Two outcomes: + +- **Early exit** — two or more hard signals fired, or one hard + signal plus three or more soft signals. **Propose** the slop + report to the maintainer (template in + [`slop-detection.md` § Maintainer interaction](slop-detection.md#maintainer-interaction-on-early-exit)) + and wait for an action choice (`[C]omment`, `[X]` close+lock, + `[R]eview anyway`, `[S]kip`, `[Q]uit`). **Do not proceed to + Step 3** until the maintainer either picks `[R]eview anyway` + (which resumes the normal flow) or an exit action (which ends + this PR's flow and moves to Step 9). + +- **Note only** — fewer signals than the early-exit threshold. + Add a `[suspicious]` chip to the already-shown headline when + at least one hard signal or two or more soft signals fired; + otherwise proceed silently. In either case, **continue to + Step 3** without interruption. Review Comment: In the note-only outcome, the instruction to add a `[suspicious]` chip to the "already-shown headline" is ambiguous (the headline was emitted in Step 1). It would be clearer to state that the headline should be re-rendered with the chip (or that the chip should be shown in the next prompt) rather than implying in-place modification. ########## skills/pr-management-code-review/slop-detection.md: ########## @@ -0,0 +1,242 @@ +<!-- SPDX-License-Identifier: Apache-2.0 + https://www.apache.org/licenses/LICENSE-2.0 --> + +# Slop detection — structural scan + +This step runs immediately after Step 2 (diff and metadata fetched), +before the full line-by-line review in Step 3. It is cheap — purely +structural analysis, no LLM token budget — and it short-circuits the +review when a PR is clearly not a genuine upstream contribution. + +"Slop" here means a PR whose structure demonstrates it is a **class +project, personal experiment, or low-effort AI-generated submission** +being pushed into the upstream repository. The goal is to catch +crystal-clear cases early, not to flag every imperfect PR. When in +doubt, proceed with the normal review. + +--- + +## Signals + +Signals are split into **hard** (individually strong) and **soft** +(individually weak; accumulate). All checks use data already in the +Step 2 payload — no extra `gh` calls. + +### Hard signals + +Each hard signal alone has a moderate probability of indicating slop; +two or more together are nearly conclusive. + +| ID | Signal | How to detect | +|---|---|---| +| H1 | **New top-level directory, unrecognised** | A file path in `files[].path` starts with a directory that (a) is not present in the base ref's tree and (b) has no plausible relationship to the rest of the PR's changes. Strongest form: a new directory whose `README` or docs reference a university course, team name, instructor, or academic project. | +| H2 | **Private-fork issue reference in PR body** | The body contains an issue URL or bare `#N` reference that resolves to a fork rather than the upstream repo (pattern: `github.com/<author>/<repo-name>#\d+` where `<repo-name>` is not the upstream repo). | +| H3 | **Fork merge-commit flood** | The commit list contains 3+ commit messages matching `^Merge (pull request|branch) #\d+ from` that all share the same fork prefix and were authored within a narrow window (< 60 minutes apart). | +| H4 | **Multi-author team project** | Commits are authored by 3 or more distinct GitHub logins, yet the PR is opened by a single account — typical of a university team pushing their entire fork history. | +| H5 | **Area sprawl** | Changed files span 5 or more distinct top-level directories (or well-known project sub-areas) with no discernible semantic relationship. Count using the first two path components of each changed file. | + +### Soft signals + +| ID | Signal | How to detect | +|---|---|---| +| S1 | **Ticket-style PR title** | Title matches patterns like `[Ticket #N]`, `ts/ticket-\d+`, `sprint-N`, `task-\d+`, or contains a student name followed by a ticket reference. | +| S2 | **Template-only PR body** | Body contains no prose beyond the PR template boilerplate (checked: no description above the first `---`, no non-template `closes:` / `related:` references to the upstream repo). | +| S3 | **No real CI** | `statusCheckRollup` contains only external bots (e.g. Mergeable, WIP, boring-cyborg) and zero entries from the project's own CI workflows. | +| S4 | **Label sprawl** | PR carries 3+ `area:` labels spanning unrelated subsystems, suggesting the author ran an automated labeller or copied labels from multiple separate changes. | +| S5 | **Commit messages reference internal sprint/ticket tooling** | 2+ commit messages contain phrases like `sprint`, `kanban`, `jira`, `ticket #`, `story #`, or `[CSS \d+` (course codes). | Review Comment: Soft signal S5’s example pattern for course codes is missing a closing bracket and reads like an incomplete regex, which makes the detection guidance ambiguous. ########## skills/pr-management-code-review/slop-detection.md: ########## @@ -0,0 +1,242 @@ +<!-- SPDX-License-Identifier: Apache-2.0 + https://www.apache.org/licenses/LICENSE-2.0 --> + +# Slop detection — structural scan + +This step runs immediately after Step 2 (diff and metadata fetched), +before the full line-by-line review in Step 3. It is cheap — purely +structural analysis, no LLM token budget — and it short-circuits the +review when a PR is clearly not a genuine upstream contribution. + +"Slop" here means a PR whose structure demonstrates it is a **class +project, personal experiment, or low-effort AI-generated submission** +being pushed into the upstream repository. The goal is to catch +crystal-clear cases early, not to flag every imperfect PR. When in +doubt, proceed with the normal review. + +--- + +## Signals + +Signals are split into **hard** (individually strong) and **soft** +(individually weak; accumulate). All checks use data already in the +Step 2 payload — no extra `gh` calls. + +### Hard signals + +Each hard signal alone has a moderate probability of indicating slop; +two or more together are nearly conclusive. + +| ID | Signal | How to detect | +|---|---|---| +| H1 | **New top-level directory, unrecognised** | A file path in `files[].path` starts with a directory that (a) is not present in the base ref's tree and (b) has no plausible relationship to the rest of the PR's changes. Strongest form: a new directory whose `README` or docs reference a university course, team name, instructor, or academic project. | +| H2 | **Private-fork issue reference in PR body** | The body contains an issue URL or bare `#N` reference that resolves to a fork rather than the upstream repo (pattern: `github.com/<author>/<repo-name>#\d+` where `<repo-name>` is not the upstream repo). | Review Comment: Hard signals H1/H2 currently describe checks that can't be performed from the Step 2 payload alone (base-ref tree presence; resolving whether a bare `#N` points at a fork). Tighten the detection text to match what can be checked locally without extra API resolution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
