This is an automated email from the ASF dual-hosted git repository.
paulk-asert pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/groovy.git
The following commit(s) were added to refs/heads/master by this push:
new efdfc3e0cc AI readiness: add draft run reproducer and reassess skills
efdfc3e0cc is described below
commit efdfc3e0cc3de041018b1469644c09dbea344ac3
Author: Paul King <[email protected]>
AuthorDate: Tue May 12 16:48:38 2026 +1000
AI readiness: add draft run reproducer and reassess skills
---
.agents/skills/groovy-fix-workflow/SKILL.md | 2 +-
.agents/skills/groovy-reassess/SKILL.md | 348 ++++++++++++++++++++++++++++
.agents/skills/groovy-reproducer/SKILL.md | 346 +++++++++++++++++++++++++++
AGENTS.md | 2 +
4 files changed, 697 insertions(+), 1 deletion(-)
diff --git a/.agents/skills/groovy-fix-workflow/SKILL.md
b/.agents/skills/groovy-fix-workflow/SKILL.md
index 24a519bcdd..169ce7d802 100644
--- a/.agents/skills/groovy-fix-workflow/SKILL.md
+++ b/.agents/skills/groovy-fix-workflow/SKILL.md
@@ -1,4 +1,4 @@
-<!--
+gro<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
diff --git a/.agents/skills/groovy-reassess/SKILL.md
b/.agents/skills/groovy-reassess/SKILL.md
new file mode 100644
index 0000000000..3f12ff1c4f
--- /dev/null
+++ b/.agents/skills/groovy-reassess/SKILL.md
@@ -0,0 +1,348 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+---
+name: groovy-reassess
+description: Running a bulk reassessment campaign over old GROOVY JIRA issues
— narrow JQL selection, per-issue reproducer extraction and execution via
`groovy-reproducer`, classification (`fixed-on-master` / `still-fails-same` /
`still-fails-different` / `cannot-run-*` / `intended-behaviour` /
`duplicate-of-resolved` / `timeout`), structured report and per-issue evidence
package, and a strict hand-back contract — no JIRA comments, no transitions, no
closures posted on behalf of the proj [...]
+license: Apache-2.0
+compatibility: claude, codex, copilot, cursor, gemini, aider
+metadata:
+ audience: contributors to apache/groovy
+ scope: bulk-reassessment-of-old-jira-issues
+---
+
+# Groovy reassess
+
+Use this skill when the task is a **campaign** over old GROOVY JIRA
+issues: pick a bounded candidate set, run each reporter's reproducer
+against the current `master`, classify the outcome, and produce a
+report a committer can scan and act on. The campaign is the safest
+non-trivial automation against ASF Groovy — read-only against JIRA,
+side-effect-free against the project, advisory output. But "safe"
+does not mean "no rules": the hand-back contract below is firm.
+
+This skill is the **campaign layer**. Per-issue mechanics live
+elsewhere:
+
+- [`groovy-jira`](../groovy-jira/SKILL.md) — the JQL recipes that
+ select the candidate set; the field-ownership rules; the
+ "comment, don't transition" rule that applies at scale here.
+- [`groovy-triage`](../groovy-triage/SKILL.md) — the single-issue
+ triage workflow; pieces of it apply to each candidate.
+- [`groovy-reproducer`](../groovy-reproducer/SKILL.md) — the
+ load-bearing per-issue piece: locate the reproducer, classify the
+ shape, adapt, run, record evidence.
+- [`groovy-fix-workflow`](../groovy-fix-workflow/SKILL.md) — where
+ the `still-fails-*` tail goes once the campaign is done; the
+ campaign produces ready-made reproducers for the fix workflow.
+
+## When to use this skill
+
+**Use it for:**
+
+- A bounded sweep of old GROOVY issues (e.g. 10–50 at a time) to
+ identify silent fixes, still-failing bugs, intended-behaviour
+ misclassifications, or duplicates.
+- Producing a structured report a committer can review in one
+ sitting and decide what to act on.
+- Generating ready-made reproducers and evidence packages that feed
+ [`groovy-fix-workflow`](../groovy-fix-workflow/SKILL.md) for the
+ still-failing tail.
+
+**Don't use it for:**
+
+- Single-issue triage — that's [`groovy-triage`](../groovy-triage/SKILL.md)
+ on its own.
+- Mass JIRA mutation (comments, transitions, closures, label
+ edits). The campaign is **read-only against JIRA**.
+- "Reassessing all 800 open issues" in one go — the campaign needs
+ bounding. Many small passes beat one giant one.
+- Anything that needs project-side authorisation (a bot identity,
+ a dev@ mandate). The campaign is contributor work; the report is
+ the output a contributor brings to dev@ or to a committer, not a
+ project-mandated activity.
+
+## Read first
+
+- [`GOVERNANCE.md`](../../../GOVERNANCE.md) — JIRA's role as the
+ canonical record; the campaign produces recommendations, not
+ decisions.
+- [`AGENTS.md`](../../../AGENTS.md) — the no-fabrication,
+ no-comment-on-behalf-of-the-project posture that scales up here.
+- [`groovy-reproducer`](../groovy-reproducer/SKILL.md) — fully.
+ The campaign is largely a loop over this skill.
+
+## Top failure modes
+
+These are the recurring mistakes at the campaign level:
+
+1. **Bulk-posting findings to JIRA.** Even when 30 of 30 findings
+ say `fixed-on-master` with strong evidence, the campaign does
+ *not* post 30 JIRA comments, transition 30 issues, or close
+ anything. The output is a report; a committer decides whether
+ and how to publish it. See failure mode 1 in
+ [`groovy-jira`](../groovy-jira/SKILL.md) for the underlying rule.
+2. **Optimistic classification on weak evidence.** A reproducer
+ that didn't compile is `cannot-run-extraction`, not
+ `fixed-on-master`. The taxonomy has cells for these for a
+ reason; reach for the precise one. "Looks fine to me" is not a
+ classification.
+3. **Confusing "passes on this JDK" with "fixed."** The classic
+ over-claim from [`groovy-reproducer`](../groovy-reproducer/SKILL.md)
+ failure mode 10, multiplied by the campaign size. Where the
+ verdict matters and a JDK retry is feasible, do it before
+ landing `fixed-on-master`.
+4. **Unbounded scope.** Trying to sweep 200 issues in one session
+ blows context, produces low-quality bulk output, and means a
+ crash at issue 150 is a 150-issue loss. Bound the candidate set
+ *before* the loop starts: a JQL with a `LIMIT`-equivalent, an
+ age bucket, a component slice.
+5. **No resumability.** A 50-issue run that crashes at issue 30
+ must be resumable from issue 31. Per-issue evidence files on disk
+ (per [`groovy-reproducer`](../groovy-reproducer/SKILL.md)'s
+ evidence package) are the resumption point — in-memory campaign
+ state is not.
+6. **Burying the headlines.** "30 fixed-on-master, 5 still-fail,
+ 15 cannot-run." The 5 still-fails are usually the most
+ important rows — they are tickets where work might actually be
+ done. Surface them at the top of the report, not buried by the
+ `fixed-on-master` majority.
+7. **Recommending workflow transitions in the report.** "Close
+ GROOVY-1234 as Cannot Reproduce" frames the agent as the
+ decider. Phrase as recommendation: "`fixed-on-master`; a
+ committer may want to consider closing as Cannot Reproduce
+ after a second pair of eyes." See
+ [`groovy-jira`](../groovy-jira/SKILL.md) on
+ comment-not-transition.
+8. **Fabricating evidence for `cannot-run-*`.** "Probably passes on
+ master." That's a guess in a verdict slot. If it can't be run,
+ the verdict is the cannot-run category — no further claim.
+9. **Drifting from the original report.** Heavy adaptation of a
+ reproducer can exercise different code paths from the
+ reporter's original. The verdict has to be against the
+ *original* behaviour; if the adaptation is so heavy that it's
+ really a different test, the issue is `cannot-run-extraction`,
+ not `passes`.
+10. **Skipping the read-first pass on the issue.** A surprising
+ fraction of old issues have a closing comment like "fixed in
+ 4.0.x, issue left open by mistake" or "won't fix, see
+ GROOVY-XXXX". Skim the comments before reproducing — saves the
+ reproduction budget and produces better classifications.
+11. **Hammering JIRA's REST API.** ASF JIRA is anonymous-readable
+ but shared. Cache aggressively (per-issue evidence retains the
+ description and comments), throttle requests, and never run
+ the campaign in a tight loop that re-fetches the same issue.
+12. **Treating duplicates as findings.** If the reassessment
+ discovers issue A is a duplicate of resolved issue B, the
+ classification is `duplicate-of-resolved` with a citation —
+ not "closing A." Same hand-back posture as everything else.
+13. **Posting a wall-of-text to dev@.** A 50-row table without a
+ summary is noise. The report opens with a short stanza — N
+ issues swept, M classified each way, K headline findings — and
+ *then* the table. The committer's first 30 seconds with the
+ report should produce a decision about whether to read on.
+
+## Selecting the candidate set
+
+Selection drives the campaign's quality. Pick a *narrow*, *bounded*
+slice; do not boil the ocean.
+
+JQL building blocks come from
+[`groovy-jira`](../groovy-jira/SKILL.md). Useful slices for
+reassessment:
+
+- **Age bucket × open:** `project = GROOVY AND statusCategory != Done
+ AND created < "2020/01/01" AND created >= "2018/01/01"
+ ORDER BY created ASC`. Buckets of two years are scannable; ten
+ years is not.
+- **Stale within a component:** `project = GROOVY AND statusCategory
+ != Done AND component = "<X>" AND updated < -730d`. Pairs well
+ with an area you know — your fix-side strength shapes the
+ candidate set.
+- **Affected version end-of-life:** `project = GROOVY AND
+ statusCategory != Done AND affectedVersion = "2.4.x"` — versions
+ long out of support are high-yield for `fixed-on-master`.
+- **No component (triage-then-reassess):** `project = GROOVY AND
+ statusCategory != Done AND component is EMPTY`. The reassessment
+ can also produce a `Component/s` suggestion (see
+ [`groovy-jira`](../groovy-jira/SKILL.md)).
+
+Cap the per-session set. A practical first pilot is 5–10 issues
+spanning *different reproducer shapes* (one runnable script, one
+attachment, one prose-only, one `@Grab`, one comment-with-snippet)
+so the pipeline meets each shape early. Pilots beat the first
+hundred issues you'd naturally pick.
+
+## Procedure
+
+For each campaign session:
+
+1. **Pick the candidate set.** JQL from
+ [`groovy-jira`](../groovy-jira/SKILL.md), capped (see above).
+ Persist the candidate list to disk before the loop starts so a
+ crash leaves a recoverable plan.
+2. **Set up the scratch corpus.** A directory hierarchy under
+ `~/work/groovy-reassess/<campaign-id>/`, one subdirectory per
+ issue. Per [`groovy-reproducer`](../groovy-reproducer/SKILL.md),
+ each subdirectory holds `description.md`, `reproducer.<ext>`,
+ `original.<ext>`, `run.log`, `verdict.json`. This is *not* in
+ the Groovy checkout.
+3. **For each issue, in order:**
+ - Skip if its `verdict.json` already exists and is well-formed
+ (resumability).
+ - Read the JIRA issue and skim comments for an obvious
+ "already fixed" / "won't fix" / "see GROOVY-XXXX" — early
+ classifications save time.
+ - Hand off to [`groovy-reproducer`](../groovy-reproducer/SKILL.md)
+ for extraction, adaptation, running, and evidence capture.
+ - Read the classification from `verdict.json`.
+ - Reset the working tree before the next issue.
+4. **After the loop, build the report** (see below). Do *not*
+ post anything.
+5. **Hand back** to the human — branch (if any local Groovy
+ commits, e.g. adapted `@Test` files kept for follow-up),
+ scratch corpus path, report path.
+
+## Classification taxonomy
+
+The campaign uses the per-issue classifications produced by
+[`groovy-reproducer`](../groovy-reproducer/SKILL.md), and adds two
+of its own that don't fall out of a single run.
+
+Per-issue (from `groovy-reproducer`):
+
+| Classification | Meaning | Recommendation a committer might consider |
+|---|---|---|
+| `fixed-on-master` | Reproducer passes on current `master` with the
originally-affected JDK reachable, and the original failure pattern is gone. |
Close as Cannot Reproduce after a second pair of eyes. |
+| `still-fails-same` | Reproducer fails on `master` with the same failure
pattern as the report. | Hand off to
[`groovy-fix-workflow`](../groovy-fix-workflow/SKILL.md). |
+| `still-fails-different` | Reproducer fails on `master` but with a different
signature (different exception, different message). | Committer judgement —
could be a regression-of-a-regression, a related bug, or environmental. |
+| `cannot-run-extraction` | No usable reproducer (prose-only, missing
attachment, fragment too incomplete to adapt without speculation). |
Needs-info: ask the reporter, or close as Incomplete if old enough and reporter
is unreachable (committer call). |
+| `cannot-run-environment` | Reproducer needs a JDK/OS/dep we don't have
locally. | Note the gap; may be a candidate for a richer environment matrix. |
+| `cannot-run-dependency` | `@Grab` resolution failed (dep gone from
configured repos). | Often correlates with very old issues; needs-info or close
as Incomplete. |
+| `timeout` | Hung past the configured bound. | Human review — might be the
bug, might be a hung adaptation. |
+| `needs-separate-workspace` | Reproducer is a multi-file project. | Spin up
an isolated workspace if it matters; otherwise leave open. |
+
+Campaign-level adds:
+
+| Classification | Meaning | Recommendation |
+|---|---|---|
+| `intended-behaviour` | Reproduction runs and behaves as the spec/docs
describe; the "bug" is by design. | Doc clarification or close as Not A Bug. |
+| `duplicate-of-resolved` | Reassessment discovered the issue duplicates a
resolved one (citation required). | Close as Duplicate, citing the linked
issue. |
+
+Each per-issue verdict in the report carries the classification
+plus the evidence-package path. Each campaign-level classification
+carries the citation that justifies it.
+
+## Report shape
+
+A reassessment report has three layers:
+
+1. **One-paragraph summary at the top.** N issues swept, breakdown
+ per classification, K headline findings worth a committer's
+ attention (typically the `still-fails-*` rows and any
+ `duplicate-of-resolved` with strong citation).
+2. **Per-classification sections, headlines first.** Order:
+ `still-fails-same`, `still-fails-different`, `duplicate-of-resolved`,
+ `fixed-on-master`, `intended-behaviour`, then the cannot-runs and
+ `needs-separate-workspace`. Within each section, a table:
+ JIRA-ID, one-line summary, evidence-package path, recommended
+ next action.
+3. **Per-issue evidence packages** (filesystem; not inlined in the
+ report). The report links into them.
+
+Suggested filename for the report: `report.md` inside the campaign
+scratch directory. A committer reading it should be able to decide
+"yes I'll act on these N items" in a single scan.
+
+## Hand-back to a human
+
+The campaign produces:
+
+- The scratch corpus path (with per-issue evidence packages).
+- The report path.
+- Any local Groovy-checkout changes worth keeping (e.g. adapted
+ `@Test` files that the committer might want as a starting point
+ for a real regression test).
+- A short verbal summary: "Swept N issues. M `fixed-on-master`, K
+ `still-fails-same`. Headlines: GROOVY-A, GROOVY-B, …."
+
+The campaign does **not**:
+
+- Post comments to JIRA.
+- Transition any issue.
+- Open any PR (draft or otherwise).
+- Email dev@ or anyone else on behalf of the project.
+- Self-assign or otherwise touch JIRA fields.
+
+With explicit instruction (per [`groovy-jira`](../groovy-jira/SKILL.md)
+and [`groovy-fix-workflow`](../groovy-fix-workflow/SKILL.md)), the
+output may then be used as:
+
+- The basis for a single dev@ post summarising the campaign
+ ("Reassessed N issues from 2018–2019; here is the report and
+ evidence — feedback welcome before any committer acts on the
+ recommendations").
+- Per-issue JIRA comment drafts the committer reviews and posts.
+- A starting point for [`groovy-fix-workflow`](../groovy-fix-workflow/SKILL.md)
+ on each `still-fails-same` row.
+
+The single dev@-thread shape is the recommended publication path —
+N separate JIRA comments at once is noisy and out of step with how
+the project communicates.
+
+## Validation checklist
+
+Before declaring a campaign session complete:
+
+- [ ] Candidate set was bounded *before* the loop started; the
+ bound is recorded in the report.
+- [ ] Per-issue evidence package on disk for every candidate, with
+ `verdict.json` and a non-empty `description.md`.
+- [ ] Every classification used is one of the taxonomy entries; no
+ free-form labels.
+- [ ] `fixed-on-master` rows include the rev and JDK in the
+ evidence; the verdict isn't over-claimed.
+- [ ] `still-fails-same` rows are surfaced at the top of the
+ report, not buried.
+- [ ] `cannot-run-*` rows have a concrete reason in the evidence
+ (which dep failed, what was missing) — not "could not run."
+- [ ] No JIRA mutation occurred. No PR was opened. No dev@ post
+ was sent.
+- [ ] The report opens with a summary stanza a committer can scan
+ in 30 seconds.
+- [ ] Working tree was clean at the end of the session.
+- [ ] Hand-back artefact lists the scratch corpus path, the report
+ path, any local commits worth keeping, and the recommended
+ publication path (typically a single dev@ thread).
+
+## References
+
+- [`GOVERNANCE.md`](../../../GOVERNANCE.md) — JIRA as canonical
+ record; dev@ as the project's primary discussion channel.
+- [`AGENTS.md`](../../../AGENTS.md) — the licensing, provenance,
+ and "what *not* to do" rules the campaign inherits.
+- `.agents/skills/groovy-jira/SKILL.md` — JQL recipes for selection;
+ the comment-not-transition rule applied campaign-wide.
+- `.agents/skills/groovy-triage/SKILL.md` — single-issue workflow;
+ pieces apply to each candidate.
+- `.agents/skills/groovy-reproducer/SKILL.md` — the per-issue load
+ -bearing skill; the campaign is a loop over it.
+- `.agents/skills/groovy-fix-workflow/SKILL.md` — destination for
+ the `still-fails-same` tail.
+- `.agents/skills/groovy-tests/SKILL.md` — when an adapted `@Test`
+ is kept as the starting point for a real regression test.
diff --git a/.agents/skills/groovy-reproducer/SKILL.md
b/.agents/skills/groovy-reproducer/SKILL.md
new file mode 100644
index 0000000000..dfc6456ef7
--- /dev/null
+++ b/.agents/skills/groovy-reproducer/SKILL.md
@@ -0,0 +1,346 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+---
+name: groovy-reproducer
+description: Extracting and running reproducer code from a GROOVY JIRA report
— locating the reproducer (description, comments, attachments), classifying its
shape (runnable script, `@Test` snippet, inline fragment, prose-only,
attachment, `@Grab`-using, multi-file project), adapting it to a runnable form
*without fabrication*, running it with a bounded timeout, and recording
deterministic evidence (revision, JDK, command, output, exit code) plus a
classification (same-failure / differen [...]
+license: Apache-2.0
+compatibility: claude, codex, copilot, cursor, gemini, aider
+metadata:
+ audience: contributors to apache/groovy
+ scope: jira-reproducer-extraction-and-execution
+---
+
+# Groovy reproducer
+
+Use this skill when the job is to **take a JIRA-described problem and
+actually run it**: find the reproducer code, work out what shape it's
+in, adapt it to a runnable form, and execute it against a known
+revision and JDK with enough evidence captured that a committer can
+trust the verdict without redoing the work.
+
+This skill is the load-bearing piece for both single-issue triage
+(when a stronger-than-eyeballed reproduction is wanted) and the
+bulk-reassessment campaign — it doesn't speak about workflow, batch
+processing, or hand-back. For those, defer:
+
+- [`groovy-triage`](../groovy-triage/SKILL.md) — the surrounding
+ triage workflow for a single issue; it calls into this skill at
+ the "attempt reproduction on `master`" step.
+- [`groovy-reassess`](../groovy-reassess/SKILL.md) — the bulk
+ reassessment campaign; it calls into this skill for every issue in
+ the candidate set.
+- [`groovy-tests`](../groovy-tests/SKILL.md) — when the reproducer
+ is adapted into a `@Test`, this skill owns *running* it; that one
+ owns *placement and naming*.
+- [`groovy-jira`](../groovy-jira/SKILL.md) — the JQL and field
+ conventions for the issue being reproduced.
+
+## When to use this skill
+
+**Use it for:**
+
+- Extracting a reproducer from a JIRA description, comment thread, or
+ attachment and running it against the current checkout.
+- Producing an evidence package (rev, JDK, command, output, exit
+ code, classification) that supports a triage finding or a
+ reassessment verdict.
+- Deciding whether a reporter's snippet can be reasonably adapted to
+ a runnable form *without* fabrication.
+
+**Don't use it for:**
+
+- Writing a fresh regression test where no reporter reproducer
+ exists — that's [`groovy-tests`](../groovy-tests/SKILL.md), and
+ the regression test is the contributor's design call, not a
+ reproduction of someone else's report.
+- Single-issue triage workflow as a whole —
[`groovy-triage`](../groovy-triage/SKILL.md)
+ is the workflow; this skill is one of its steps.
+- Bulk processing — [`groovy-reassess`](../groovy-reassess/SKILL.md)
+ is the campaign layer; this skill handles one reproducer at a time.
+- Reporters' projects requiring a separate Gradle / Maven build (a
+ zip with its own `build.gradle`). Run those out of the Groovy
+ checkout if it matters; this skill focuses on reproducers that
+ run against the current Groovy build.
+
+## Read first
+
+- [`CONTRIBUTING.md`](../../../CONTRIBUTING.md) — the regression-test
+ shape that an adapted reproducer often takes.
+- [`AGENTS.md`](../../../AGENTS.md) — the "what *not* to do" list
+ applies here too: no fabricated reproducers, no hallucinated
+ identifiers, no scratch files left behind.
+- `build-logic/src/main/groovy/org.apache.groovy-tested.gradle` —
+ the source of the `groovy/grape/` `junit.network` exclusion;
+ reproducers under `groovy/grape/` need the same gate. See failure
+ mode 12 in [`groovy-tests`](../groovy-tests/SKILL.md).
+
+## Top failure modes
+
+These are the recurring mistakes when extracting and running JIRA
+reproducers:
+
+1. **Fabricating a reproducer when none exists.** "The reporter
+ described X happening; I'll write code that does X." That is the
+ agent doing the reporter's job. If the description is prose-only
+ and no attachment helps, classify `cannot-run-extraction` and
+ stop. The reporter's specific code is what makes a reproduction
+ trustworthy; an agent-written stand-in is a different exercise
+ (and a different verdict).
+2. **Skipping the comment thread.** Reporters frequently post a
+ simplified reproducer in a comment after the initial description.
+ Reading only the description misses it. Inventory every code
+ block in the description *and* every comment *and* every
+ attachment before picking a candidate.
+3. **Treating `@Test` adaptation as equivalent to a script run.**
+ Groovy scripts and class methods have different scoping (script
+ bindings vs. fields, implicit `main`, `def` vs. typed locals).
+ A reproducer that fails as a script may pass inside a `@Test` and
+ vice versa. If the original was a script, *also* run it as a
+ script before claiming the bug is gone — the `@Test` adaptation
+ exercises different paths.
+4. **Wrapping a JIRA snippet in `assert false` to "force a failure".**
+ That signals nothing about the underlying behaviour. The
+ adaptation should exercise the same path the reporter described
+ and let it fail (or not) naturally. `assert <reporter-expected>`
+ is fine; `assert false // bug` is theatre.
+5. **Skipping the original JDK.** A reproducer that passes on
+ JDK 21 may still fail on the JDK the reporter used (JDK 8, 11,
+ 17). The strict claim is "passes on JDK X master." Without JDK
+ awareness, the verdict is incomplete — note the JDK in evidence,
+ and where the verdict matters, retry on the originally-affected
+ JDK if it is reasonably available locally.
+6. **Running `@Grab` and silently absorbing the dependency failure.**
+ Grape resolution that 404s or hits a dead repo should be a
+ recognised classification (`cannot-run-dependency`), not "test
+ passes" because the resolution exception was swallowed and the
+ body never ran. Check exit code *and* output for resolution
+ errors before classifying.
+7. **No timeout.** A buggy reproducer can hang (infinite loop,
+ deadlock, awaiting input). Without a timeout, one bad issue burns
+ hours. Default to a bounded run (60s is a reasonable starting
+ point; raise per-issue if the reporter notes long-running
+ behaviour) and classify as `timeout` if hit.
+8. **Capturing stdout but not stderr.** Many reproducers print the
+ bug indicator (stack traces, MOP errors, "expected X got Y") to
+ stderr. Capture both streams and surface both in the evidence.
+9. **Comparing run output without normalising line endings or
+ locale.** Same trap as [`groovy-tests`](../groovy-tests/SKILL.md)
+ failure mode 11. Output captured on Windows uses `\r\n`; locale
+ shifts number/date formatting. Normalise before string-compare,
+ or compare on parsed values.
+10. **Over-claiming "fixed" from a single-environment pass.** A
+ clean run on your laptop may be environment-luck, not a real
+ fix — locale, charset, default JDK, file-encoding defaults all
+ bite. Where the verdict is `passes`, qualify it with the
+ environment that produced the pass; don't generalise.
+11. **Discarding the working tree before recording the run.**
+ Capture `rev`, `jdk`, command, output, runtime, exit code
+ *before* reverting any test additions or cleaning the scratch
+ directory. The evidence is what gives the verdict its weight;
+ losing it means the work has to be redone.
+12. **Letting working-tree state leak between reproducers.** When
+ running many in sequence (the reassessment case), reset between
+ issues — `git stash --include-untracked && git clean -fd` on
+ a sacrificial branch, or a separate scratch directory per
+ issue. Crosstalk between reproducers (a file written by issue
+ A's reproducer that issue B picks up) corrupts verdicts in ways
+ that are hard to spot.
+13. **Polluting the local Grape cache across many issues.** A sweep
+ that runs many `@Grab`-using scripts can degrade
+ `~/.groovy/grapes/` (see GROOVY-12005). For a campaign,
+ consider a per-sweep Grape root via `-Dgrape.root=<scratch>` so
+ the user's everyday cache stays clean.
+
+## Reproducer shape taxonomy
+
+Most JIRA reproducers fall into one of these shapes. The handling
+recipe for each is the meat of this skill.
+
+**A. Complete runnable Groovy script** — a `.groovy` file body in
+the description or a comment, with imports and a top-level expression
+or `main`. Recipe: save to a scratch `.groovy`, build a current
+distribution (`./gradlew :installDist` on the relevant subproject),
+and run with the built `groovy` binary. Or, for many cases, adapt as
+a `@Test` per [`groovy-tests`](../groovy-tests/SKILL.md) and run
+targeted — but be aware of failure mode 3 (script vs `@Test`
+semantics).
+
+**B. `@Test`-shaped snippet** — already class-and-annotation shaped.
+Recipe: place under `src/test/groovy/bugs/Groovy<NNNN>.groovy` (or
+the subproject's `src/test/`) per [`groovy-tests`](../groovy-tests/SKILL.md);
+run targeted (`./gradlew :test --tests <FQN>`).
+
+**C. Inline fragment** — a few lines, not a complete script ("when I
+write `foo.bar()` I get NPE"). Recipe: wrap minimally — a `@Test`
+with the fragment as the body, or a script with `def` declarations
+for any referenced names. Where the wrap requires guessing types or
+context, lean toward `needs-info` rather than speculating.
+
+**D. Stack-trace-only** — a `.printStackTrace()` output, no code.
+Recipe: classify `cannot-run-extraction`. The stack trace is a *hint
+about the area*, not a reproducer. Don't construct code to "make
+that stack trace appear"; that's fabrication.
+
+**E. Prose-only** — natural-language description, no code. Recipe:
+`cannot-run-extraction`. Don't write code from prose.
+
+**F. Attachment** — `.groovy`, `.java`, `.zip` (project), `.txt`
+(log), `.gz` (heap dump, etc.). For `.groovy` / `.java` files,
+handle per shape A/B. For project zips, this is out of scope for
+this skill — flag as `needs-separate-workspace` and let the
+campaign / triage handle it. For logs and heap dumps, treat as
+hints (shape D).
+
+**G. `@Grab`-using script** — depends on resolution against Maven
+Central or other repos. Recipe: run as in A, but with network
+available; verify the resolution succeeded before interpreting the
+result. A pinned old version may no longer be in the configured
+repos (`cannot-run-dependency`).
+
+**H. Multi-file project** — a tarball or zip with its own
+`build.gradle` / `pom.xml`. Recipe: this skill's posture is "run
+against the Groovy checkout"; project-style reproducers run *with*
+their own build. Classify as `needs-separate-workspace` and surface;
+the user / campaign can spin up an isolated workspace if it matters.
+
+## Procedure
+
+For each reproducer:
+
+1. **Inventory.** Read the JIRA description, every comment, every
+ attachment. Note all code blocks (verbatim, with their location
+ — "description", "comment 3 by …", "attachment foo.groovy").
+ Note the reporter's claimed environment: Groovy version, JDK, OS.
+2. **Pick the candidate.** When multiple reproducers exist, prefer
+ the simplest *complete* one. Note the fallback chain — if the
+ simplest fails to adapt, the next one in line is the
+ reporter's original.
+3. **Classify the shape** per the taxonomy above. Output the shape
+ category as part of the evidence package.
+4. **Adapt without fabrication.**
+ - Shape A: copy verbatim to a scratch file.
+ - Shape B: place per [`groovy-tests`](../groovy-tests/SKILL.md).
+ - Shape C: wrap minimally; if the wrap requires speculation
+ (guessing a type, inventing a missing variable), stop and
+ classify `cannot-run-extraction` with a note about what was
+ missing.
+ - Shapes D, E: classify `cannot-run-extraction`; don't adapt.
+ - Shape F: per inner shape; for project zips, classify
+ `needs-separate-workspace`.
+ - Shape G: copy verbatim; flag for Grape-aware running.
+ - Shape H: classify `needs-separate-workspace`.
+5. **Build the current Groovy distribution** if the reproducer is a
+ script that needs the produced `groovy` binary. For `@Test`-shape
+ reproducers, the Gradle test invocation handles the build.
+6. **Run with bounded resources.** Timeout (60s default), capture
+ stdout + stderr + exit code + runtime. Record the command
+ verbatim.
+7. **Compare to the original failure pattern.** "Fails with the
+ reported exception type and a message containing the reported
+ substring" is `same-failure`. "Fails with something different" is
+ `different-failure`. "Doesn't fail" is `passes`. "Hangs past
+ timeout" is `timeout`. "Errors before exercising the path" is
+ `cannot-run-*`.
+8. **Record the evidence package** before doing anything else.
+9. **Reset the working tree** if you adapted as a `@Test` (the
+ added file must not leak to the next issue).
+
+## Run posture
+
+- **Timeout:** 60s default. Bump only when the reporter notes
+ longer-running behaviour, and record the bump in evidence.
+- **Network:** Grape needs it for shape G. Leave it on. Dependency
+ failures get the `cannot-run-dependency` classification; they are
+ not "fixed."
+- **Filesystem:** scratch directory per issue, under
+ `~/work/groovy-reassessment/<KEY>/` (or wherever the campaign
+ layout puts it — see [`groovy-reassess`](../groovy-reassess/SKILL.md)).
+ Don't write under the Groovy checkout.
+- **Working tree:** clean between reproducers. The
+ added-and-then-removed `@Test` is the most common leak source.
+- **Grape cache:** for a campaign with many `@Grab` reproducers,
+ consider `-Dgrape.root=<scratch>` so the user's
+ `~/.groovy/grapes/` stays clean.
+- **JDK selection:** record the JDK used. For verdicts where it
+ matters (`passes`, `fixed-on-master`), retry on the
+ originally-affected JDK via Gradle toolchains where reasonable.
+
+## Evidence package
+
+For each reproducer run, persist:
+
+- `description.md` — the JIRA description and the comments quoted
+ verbatim (so the verdict is auditable even if JIRA changes).
+- `reproducer.<ext>` — the adapted runnable form, or
+ `extraction-failed.md` if the shape didn't support adaptation.
+- `original.<ext>` — the literal source from JIRA, untouched, when
+ extracted.
+- `run.log` — stdout + stderr from the run, with the exact command
+ on the first line.
+- `verdict.json` — `{ "key": "GROOVY-NNNNN", "shape": "<A|B|…>",
+ "classification": "<one of: same-failure | different-failure |
+ passes | cannot-run-extraction | cannot-run-environment |
+ cannot-run-dependency | timeout | needs-separate-workspace>",
+ "rev": "<short-sha>", "jdk": "<vendor+version>",
+ "command": "<verbatim>", "runtime-ms": <int>,
+ "exit-code": <int>, "matched-original-failure": <bool>,
+ "notes": "<short>" }`.
+
+This package is what a committer needs to trust the verdict, and it
+is what [`groovy-reassess`](../groovy-reassess/SKILL.md) feeds into
+its report.
+
+## Validation checklist
+
+Before recording a verdict:
+
+- [ ] Every code block in the description, comments, and attachments
+ was inventoried — not just the description.
+- [ ] The chosen reproducer is verbatim from the report (or the
+ adaptation is mechanical, not speculative).
+- [ ] The shape is classified; if `cannot-run-*` or
+ `needs-separate-workspace`, no execution attempt was claimed.
+- [ ] Run was bounded by a timeout; the bound is recorded.
+- [ ] Both stdout and stderr were captured.
+- [ ] The exact command, revision, and JDK were recorded.
+- [ ] The classification distinguishes `same-failure` from
+ `different-failure` — the original failure pattern was
+ consulted, not just "did it throw."
+- [ ] For `@Grab` reproducers: dependency resolution succeeded
+ before any "passes" verdict was claimed.
+- [ ] For `passes`: the run environment is qualified; the verdict
+ doesn't over-claim "fixed" from a single environment.
+- [ ] Working tree was reset (no leftover scratch test class).
+- [ ] Evidence package was written before the next issue started.
+
+## References
+
+- [`CONTRIBUTING.md`](../../../CONTRIBUTING.md) — regression-test
+ shape that `@Test`-adapted reproducers fit into.
+- [`AGENTS.md`](../../../AGENTS.md) — the no-fabrication, no
+ drive-by, no scratch-files-in-tree principles applied here.
+- `.agents/skills/groovy-triage/SKILL.md` — single-issue caller of
+ this skill.
+- `.agents/skills/groovy-reassess/SKILL.md` — campaign-level caller
+ of this skill.
+- `.agents/skills/groovy-tests/SKILL.md` — placement and naming
+ when a reproducer is adapted as a `@Test`; the
+ `junit.network`-gating story for `groovy/grape/` reproducers.
+- `.agents/skills/groovy-jira/SKILL.md` — JIRA mechanics for the
+ issue around the reproducer.
diff --git a/AGENTS.md b/AGENTS.md
index 99c0144a94..362deca689 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -131,6 +131,8 @@ into the human-facing docs above.
| [`groovy-fix-workflow`](.agents/skills/groovy-fix-workflow/SKILL.md) |
Implementing a JIRA-tracked fix after triage — failing-test-first ordering,
scope discipline, hand-back to a committer (no autonomous PR / JIRA comment /
merge) |
| [`groovy-internals`](.agents/skills/groovy-internals/SKILL.md) | Compiler
and runtime work — parser, AST, type checker, transforms, class generation |
| [`groovy-jira`](.agents/skills/groovy-jira/SKILL.md) | JIRA mechanics — JQL
recipes, `Component/s` taxonomy, field ownership, and the "don't transition
workflow states" rule for AI tooling |
+| [`groovy-reassess`](.agents/skills/groovy-reassess/SKILL.md) | Bulk
reassessment of old JIRA issues — selection, per-issue reproduction,
classification (`fixed-on-master` / `still-fails-*` / `cannot-run-*` / …),
report and evidence-package hand-back; read-only against JIRA |
+| [`groovy-reproducer`](.agents/skills/groovy-reproducer/SKILL.md) |
Extracting and running a JIRA-reported reproducer — shape classification,
adaptation without fabrication, bounded run, deterministic evidence
(rev/JDK/command/output) and an outcome classification |
| [`groovy-tests`](.agents/skills/groovy-tests/SKILL.md) | Adding or modifying
tests, including JIRA regression tests and executable AsciiDoc examples |
| [`groovy-triage`](.agents/skills/groovy-triage/SKILL.md) | First-pass triage
of JIRA issues and GitHub PRs — reproducing reports against `master`, finding
duplicates, surfacing PR-readiness signals; advisory only, no committer actions
|
| [`groovysh`](.agents/skills/groovysh/SKILL.md) | Work in
`subprojects/groovy-groovysh/` — REPL commands, JLine integration, vendored
forks, terminal-aware test stack |