(groovy) branch master updated: AI readiness: add draft run reproducer and reassess skills

paulk Mon, 11 May 2026 23:48:56 -0700

This is an automated email from the ASF dual-hosted git repository.

paulk-asert pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/groovy.git



The following commit(s) were added to refs/heads/master by this push:
     new efdfc3e0cc AI readiness: add draft run reproducer and reassess skills
efdfc3e0cc is described below

commit efdfc3e0cc3de041018b1469644c09dbea344ac3
Author: Paul King <[email protected]>
AuthorDate: Tue May 12 16:48:38 2026 +1000

    AI readiness: add draft run reproducer and reassess skills
---
 .agents/skills/groovy-fix-workflow/SKILL.md |   2 +-
 .agents/skills/groovy-reassess/SKILL.md     | 348 ++++++++++++++++++++++++++++
 .agents/skills/groovy-reproducer/SKILL.md   | 346 +++++++++++++++++++++++++++
 AGENTS.md                                   |   2 +
 4 files changed, 697 insertions(+), 1 deletion(-)

diff --git a/.agents/skills/groovy-fix-workflow/SKILL.md 
b/.agents/skills/groovy-fix-workflow/SKILL.md
index 24a519bcdd..169ce7d802 100644
--- a/.agents/skills/groovy-fix-workflow/SKILL.md
+++ b/.agents/skills/groovy-fix-workflow/SKILL.md
@@ -1,4 +1,4 @@
-<!--
+gro<!--
   Licensed to the Apache Software Foundation (ASF) under one
   or more contributor license agreements.  See the NOTICE file
   distributed with this work for additional information
diff --git a/.agents/skills/groovy-reassess/SKILL.md 
b/.agents/skills/groovy-reassess/SKILL.md
new file mode 100644
index 0000000000..3f12ff1c4f
--- /dev/null
+++ b/.agents/skills/groovy-reassess/SKILL.md
@@ -0,0 +1,348 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+---
+name: groovy-reassess
+description: Running a bulk reassessment campaign over old GROOVY JIRA issues 
— narrow JQL selection, per-issue reproducer extraction and execution via 
`groovy-reproducer`, classification (`fixed-on-master` / `still-fails-same` / 
`still-fails-different` / `cannot-run-*` / `intended-behaviour` / 
`duplicate-of-resolved` / `timeout`), structured report and per-issue evidence 
package, and a strict hand-back contract — no JIRA comments, no transitions, no 
closures posted on behalf of the proj [...]
+license: Apache-2.0
+compatibility: claude, codex, copilot, cursor, gemini, aider
+metadata:
+  audience: contributors to apache/groovy
+  scope: bulk-reassessment-of-old-jira-issues
+---
+
+# Groovy reassess
+
+Use this skill when the task is a **campaign** over old GROOVY JIRA
+issues: pick a bounded candidate set, run each reporter's reproducer
+against the current `master`, classify the outcome, and produce a
+report a committer can scan and act on. The campaign is the safest
+non-trivial automation against ASF Groovy — read-only against JIRA,
+side-effect-free against the project, advisory output. But "safe"
+does not mean "no rules": the hand-back contract below is firm.
+
+This skill is the **campaign layer**. Per-issue mechanics live
+elsewhere:
+
+- [`groovy-jira`](../groovy-jira/SKILL.md) — the JQL recipes that
+  select the candidate set; the field-ownership rules; the
+  "comment, don't transition" rule that applies at scale here.
+- [`groovy-triage`](../groovy-triage/SKILL.md) — the single-issue
+  triage workflow; pieces of it apply to each candidate.
+- [`groovy-reproducer`](../groovy-reproducer/SKILL.md) — the
+  load-bearing per-issue piece: locate the reproducer, classify the
+  shape, adapt, run, record evidence.
+- [`groovy-fix-workflow`](../groovy-fix-workflow/SKILL.md) — where
+  the `still-fails-*` tail goes once the campaign is done; the
+  campaign produces ready-made reproducers for the fix workflow.
+
+## When to use this skill
+
+**Use it for:**
+
+- A bounded sweep of old GROOVY issues (e.g. 10–50 at a time) to
+  identify silent fixes, still-failing bugs, intended-behaviour
+  misclassifications, or duplicates.
+- Producing a structured report a committer can review in one
+  sitting and decide what to act on.
+- Generating ready-made reproducers and evidence packages that feed
+  [`groovy-fix-workflow`](../groovy-fix-workflow/SKILL.md) for the
+  still-failing tail.
+
+**Don't use it for:**
+
+- Single-issue triage — that's [`groovy-triage`](../groovy-triage/SKILL.md)
+  on its own.
+- Mass JIRA mutation (comments, transitions, closures, label
+  edits). The campaign is **read-only against JIRA**.
+- "Reassessing all 800 open issues" in one go — the campaign needs
+  bounding. Many small passes beat one giant one.
+- Anything that needs project-side authorisation (a bot identity,
+  a dev@ mandate). The campaign is contributor work; the report is
+  the output a contributor brings to dev@ or to a committer, not a
+  project-mandated activity.
+
+## Read first
+
+- [`GOVERNANCE.md`](../../../GOVERNANCE.md) — JIRA's role as the
+  canonical record; the campaign produces recommendations, not
+  decisions.
+- [`AGENTS.md`](../../../AGENTS.md) — the no-fabrication,
+  no-comment-on-behalf-of-the-project posture that scales up here.
+- [`groovy-reproducer`](../groovy-reproducer/SKILL.md) — fully.
+  The campaign is largely a loop over this skill.
+
+## Top failure modes
+
+These are the recurring mistakes at the campaign level:
+
+1. **Bulk-posting findings to JIRA.** Even when 30 of 30 findings
+   say `fixed-on-master` with strong evidence, the campaign does
+   *not* post 30 JIRA comments, transition 30 issues, or close
+   anything. The output is a report; a committer decides whether
+   and how to publish it. See failure mode 1 in
+   [`groovy-jira`](../groovy-jira/SKILL.md) for the underlying rule.
+2. **Optimistic classification on weak evidence.** A reproducer
+   that didn't compile is `cannot-run-extraction`, not
+   `fixed-on-master`. The taxonomy has cells for these for a
+   reason; reach for the precise one. "Looks fine to me" is not a
+   classification.
+3. **Confusing "passes on this JDK" with "fixed."** The classic
+   over-claim from [`groovy-reproducer`](../groovy-reproducer/SKILL.md)
+   failure mode 10, multiplied by the campaign size. Where the
+   verdict matters and a JDK retry is feasible, do it before
+   landing `fixed-on-master`.
+4. **Unbounded scope.** Trying to sweep 200 issues in one session
+   blows context, produces low-quality bulk output, and means a
+   crash at issue 150 is a 150-issue loss. Bound the candidate set
+   *before* the loop starts: a JQL with a `LIMIT`-equivalent, an
+   age bucket, a component slice.
+5. **No resumability.** A 50-issue run that crashes at issue 30
+   must be resumable from issue 31. Per-issue evidence files on disk
+   (per [`groovy-reproducer`](../groovy-reproducer/SKILL.md)'s
+   evidence package) are the resumption point — in-memory campaign
+   state is not.
+6. **Burying the headlines.** "30 fixed-on-master, 5 still-fail,
+   15 cannot-run." The 5 still-fails are usually the most
+   important rows — they are tickets where work might actually be
+   done. Surface them at the top of the report, not buried by the
+   `fixed-on-master` majority.
+7. **Recommending workflow transitions in the report.** "Close
+   GROOVY-1234 as Cannot Reproduce" frames the agent as the
+   decider. Phrase as recommendation: "`fixed-on-master`; a
+   committer may want to consider closing as Cannot Reproduce
+   after a second pair of eyes." See
+   [`groovy-jira`](../groovy-jira/SKILL.md) on
+   comment-not-transition.
+8. **Fabricating evidence for `cannot-run-*`.** "Probably passes on
+   master." That's a guess in a verdict slot. If it can't be run,
+   the verdict is the cannot-run category — no further claim.
+9. **Drifting from the original report.** Heavy adaptation of a
+   reproducer can exercise different code paths from the
+   reporter's original. The verdict has to be against the
+   *original* behaviour; if the adaptation is so heavy that it's
+   really a different test, the issue is `cannot-run-extraction`,
+   not `passes`.
+10. **Skipping the read-first pass on the issue.** A surprising
+    fraction of old issues have a closing comment like "fixed in
+    4.0.x, issue left open by mistake" or "won't fix, see
+    GROOVY-XXXX". Skim the comments before reproducing — saves the
+    reproduction budget and produces better classifications.
+11. **Hammering JIRA's REST API.** ASF JIRA is anonymous-readable
+    but shared. Cache aggressively (per-issue evidence retains the
+    description and comments), throttle requests, and never run
+    the campaign in a tight loop that re-fetches the same issue.
+12. **Treating duplicates as findings.** If the reassessment
+    discovers issue A is a duplicate of resolved issue B, the
+    classification is `duplicate-of-resolved` with a citation —
+    not "closing A." Same hand-back posture as everything else.
+13. **Posting a wall-of-text to dev@.** A 50-row table without a
+    summary is noise. The report opens with a short stanza — N
+    issues swept, M classified each way, K headline findings — and
+    *then* the table. The committer's first 30 seconds with the
+    report should produce a decision about whether to read on.
+
+## Selecting the candidate set
+
+Selection drives the campaign's quality. Pick a *narrow*, *bounded*
+slice; do not boil the ocean.
+
+JQL building blocks come from
+[`groovy-jira`](../groovy-jira/SKILL.md). Useful slices for
+reassessment:
+
+- **Age bucket × open:** `project = GROOVY AND statusCategory != Done
+  AND created < "2020/01/01" AND created >= "2018/01/01"
+  ORDER BY created ASC`. Buckets of two years are scannable; ten
+  years is not.
+- **Stale within a component:** `project = GROOVY AND statusCategory
+  != Done AND component = "<X>" AND updated < -730d`. Pairs well
+  with an area you know — your fix-side strength shapes the
+  candidate set.
+- **Affected version end-of-life:** `project = GROOVY AND
+  statusCategory != Done AND affectedVersion = "2.4.x"` — versions
+  long out of support are high-yield for `fixed-on-master`.
+- **No component (triage-then-reassess):** `project = GROOVY AND
+  statusCategory != Done AND component is EMPTY`. The reassessment
+  can also produce a `Component/s` suggestion (see
+  [`groovy-jira`](../groovy-jira/SKILL.md)).
+
+Cap the per-session set. A practical first pilot is 5–10 issues
+spanning *different reproducer shapes* (one runnable script, one
+attachment, one prose-only, one `@Grab`, one comment-with-snippet)
+so the pipeline meets each shape early. Pilots beat the first
+hundred issues you'd naturally pick.
+
+## Procedure
+
+For each campaign session:
+
+1. **Pick the candidate set.** JQL from
+   [`groovy-jira`](../groovy-jira/SKILL.md), capped (see above).
+   Persist the candidate list to disk before the loop starts so a
+   crash leaves a recoverable plan.
+2. **Set up the scratch corpus.** A directory hierarchy under
+   `~/work/groovy-reassess/<campaign-id>/`, one subdirectory per
+   issue. Per [`groovy-reproducer`](../groovy-reproducer/SKILL.md),
+   each subdirectory holds `description.md`, `reproducer.<ext>`,
+   `original.<ext>`, `run.log`, `verdict.json`. This is *not* in
+   the Groovy checkout.
+3. **For each issue, in order:**
+   - Skip if its `verdict.json` already exists and is well-formed
+     (resumability).
+   - Read the JIRA issue and skim comments for an obvious
+     "already fixed" / "won't fix" / "see GROOVY-XXXX" — early
+     classifications save time.
+   - Hand off to [`groovy-reproducer`](../groovy-reproducer/SKILL.md)
+     for extraction, adaptation, running, and evidence capture.
+   - Read the classification from `verdict.json`.
+   - Reset the working tree before the next issue.
+4. **After the loop, build the report** (see below). Do *not*
+   post anything.
+5. **Hand back** to the human — branch (if any local Groovy
+   commits, e.g. adapted `@Test` files kept for follow-up),
+   scratch corpus path, report path.
+
+## Classification taxonomy
+
+The campaign uses the per-issue classifications produced by
+[`groovy-reproducer`](../groovy-reproducer/SKILL.md), and adds two
+of its own that don't fall out of a single run.
+
+Per-issue (from `groovy-reproducer`):
+
+| Classification | Meaning | Recommendation a committer might consider |
+|---|---|---|
+| `fixed-on-master` | Reproducer passes on current `master` with the 
originally-affected JDK reachable, and the original failure pattern is gone. | 
Close as Cannot Reproduce after a second pair of eyes. |
+| `still-fails-same` | Reproducer fails on `master` with the same failure 
pattern as the report. | Hand off to 
[`groovy-fix-workflow`](../groovy-fix-workflow/SKILL.md). |
+| `still-fails-different` | Reproducer fails on `master` but with a different 
signature (different exception, different message). | Committer judgement — 
could be a regression-of-a-regression, a related bug, or environmental. |
+| `cannot-run-extraction` | No usable reproducer (prose-only, missing 
attachment, fragment too incomplete to adapt without speculation). | 
Needs-info: ask the reporter, or close as Incomplete if old enough and reporter 
is unreachable (committer call). |
+| `cannot-run-environment` | Reproducer needs a JDK/OS/dep we don't have 
locally. | Note the gap; may be a candidate for a richer environment matrix. |
+| `cannot-run-dependency` | `@Grab` resolution failed (dep gone from 
configured repos). | Often correlates with very old issues; needs-info or close 
as Incomplete. |
+| `timeout` | Hung past the configured bound. | Human review — might be the 
bug, might be a hung adaptation. |
+| `needs-separate-workspace` | Reproducer is a multi-file project. | Spin up 
an isolated workspace if it matters; otherwise leave open. |
+
+Campaign-level adds:
+
+| Classification | Meaning | Recommendation |
+|---|---|---|
+| `intended-behaviour` | Reproduction runs and behaves as the spec/docs 
describe; the "bug" is by design. | Doc clarification or close as Not A Bug. |
+| `duplicate-of-resolved` | Reassessment discovered the issue duplicates a 
resolved one (citation required). | Close as Duplicate, citing the linked 
issue. |
+
+Each per-issue verdict in the report carries the classification
+plus the evidence-package path. Each campaign-level classification
+carries the citation that justifies it.
+
+## Report shape
+
+A reassessment report has three layers:
+
+1. **One-paragraph summary at the top.** N issues swept, breakdown
+   per classification, K headline findings worth a committer's
+   attention (typically the `still-fails-*` rows and any
+   `duplicate-of-resolved` with strong citation).
+2. **Per-classification sections, headlines first.** Order:
+   `still-fails-same`, `still-fails-different`, `duplicate-of-resolved`,
+   `fixed-on-master`, `intended-behaviour`, then the cannot-runs and
+   `needs-separate-workspace`. Within each section, a table:
+   JIRA-ID, one-line summary, evidence-package path, recommended
+   next action.
+3. **Per-issue evidence packages** (filesystem; not inlined in the
+   report). The report links into them.
+
+Suggested filename for the report: `report.md` inside the campaign
+scratch directory. A committer reading it should be able to decide
+"yes I'll act on these N items" in a single scan.
+
+## Hand-back to a human
+
+The campaign produces:
+
+- The scratch corpus path (with per-issue evidence packages).
+- The report path.
+- Any local Groovy-checkout changes worth keeping (e.g. adapted
+  `@Test` files that the committer might want as a starting point
+  for a real regression test).
+- A short verbal summary: "Swept N issues. M `fixed-on-master`, K
+  `still-fails-same`. Headlines: GROOVY-A, GROOVY-B, …."
+
+The campaign does **not**:
+
+- Post comments to JIRA.
+- Transition any issue.
+- Open any PR (draft or otherwise).
+- Email dev@ or anyone else on behalf of the project.
+- Self-assign or otherwise touch JIRA fields.
+
+With explicit instruction (per [`groovy-jira`](../groovy-jira/SKILL.md)
+and [`groovy-fix-workflow`](../groovy-fix-workflow/SKILL.md)), the
+output may then be used as:
+
+- The basis for a single dev@ post summarising the campaign
+  ("Reassessed N issues from 2018–2019; here is the report and
+  evidence — feedback welcome before any committer acts on the
+  recommendations").
+- Per-issue JIRA comment drafts the committer reviews and posts.
+- A starting point for [`groovy-fix-workflow`](../groovy-fix-workflow/SKILL.md)
+  on each `still-fails-same` row.
+
+The single dev@-thread shape is the recommended publication path —
+N separate JIRA comments at once is noisy and out of step with how
+the project communicates.
+
+## Validation checklist
+
+Before declaring a campaign session complete:
+
+- [ ] Candidate set was bounded *before* the loop started; the
+      bound is recorded in the report.
+- [ ] Per-issue evidence package on disk for every candidate, with
+      `verdict.json` and a non-empty `description.md`.
+- [ ] Every classification used is one of the taxonomy entries; no
+      free-form labels.
+- [ ] `fixed-on-master` rows include the rev and JDK in the
+      evidence; the verdict isn't over-claimed.
+- [ ] `still-fails-same` rows are surfaced at the top of the
+      report, not buried.
+- [ ] `cannot-run-*` rows have a concrete reason in the evidence
+      (which dep failed, what was missing) — not "could not run."
+- [ ] No JIRA mutation occurred. No PR was opened. No dev@ post
+      was sent.
+- [ ] The report opens with a summary stanza a committer can scan
+      in 30 seconds.
+- [ ] Working tree was clean at the end of the session.
+- [ ] Hand-back artefact lists the scratch corpus path, the report
+      path, any local commits worth keeping, and the recommended
+      publication path (typically a single dev@ thread).
+
+## References
+
+- [`GOVERNANCE.md`](../../../GOVERNANCE.md) — JIRA as canonical
+  record; dev@ as the project's primary discussion channel.
+- [`AGENTS.md`](../../../AGENTS.md) — the licensing, provenance,
+  and "what *not* to do" rules the campaign inherits.
+- `.agents/skills/groovy-jira/SKILL.md` — JQL recipes for selection;
+  the comment-not-transition rule applied campaign-wide.
+- `.agents/skills/groovy-triage/SKILL.md` — single-issue workflow;
+  pieces apply to each candidate.
+- `.agents/skills/groovy-reproducer/SKILL.md` — the per-issue load
+  -bearing skill; the campaign is a loop over it.
+- `.agents/skills/groovy-fix-workflow/SKILL.md` — destination for
+  the `still-fails-same` tail.
+- `.agents/skills/groovy-tests/SKILL.md` — when an adapted `@Test`
+  is kept as the starting point for a real regression test.
diff --git a/.agents/skills/groovy-reproducer/SKILL.md 
b/.agents/skills/groovy-reproducer/SKILL.md
new file mode 100644
index 0000000000..dfc6456ef7
--- /dev/null
+++ b/.agents/skills/groovy-reproducer/SKILL.md
@@ -0,0 +1,346 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+---
+name: groovy-reproducer
+description: Extracting and running reproducer code from a GROOVY JIRA report 
— locating the reproducer (description, comments, attachments), classifying its 
shape (runnable script, `@Test` snippet, inline fragment, prose-only, 
attachment, `@Grab`-using, multi-file project), adapting it to a runnable form 
*without fabrication*, running it with a bounded timeout, and recording 
deterministic evidence (revision, JDK, command, output, exit code) plus a 
classification (same-failure / differen [...]
+license: Apache-2.0
+compatibility: claude, codex, copilot, cursor, gemini, aider
+metadata:
+  audience: contributors to apache/groovy
+  scope: jira-reproducer-extraction-and-execution
+---
+
+# Groovy reproducer
+
+Use this skill when the job is to **take a JIRA-described problem and
+actually run it**: find the reproducer code, work out what shape it's
+in, adapt it to a runnable form, and execute it against a known
+revision and JDK with enough evidence captured that a committer can
+trust the verdict without redoing the work.
+
+This skill is the load-bearing piece for both single-issue triage
+(when a stronger-than-eyeballed reproduction is wanted) and the
+bulk-reassessment campaign — it doesn't speak about workflow, batch
+processing, or hand-back. For those, defer:
+
+- [`groovy-triage`](../groovy-triage/SKILL.md) — the surrounding
+  triage workflow for a single issue; it calls into this skill at
+  the "attempt reproduction on `master`" step.
+- [`groovy-reassess`](../groovy-reassess/SKILL.md) — the bulk
+  reassessment campaign; it calls into this skill for every issue in
+  the candidate set.
+- [`groovy-tests`](../groovy-tests/SKILL.md) — when the reproducer
+  is adapted into a `@Test`, this skill owns *running* it; that one
+  owns *placement and naming*.
+- [`groovy-jira`](../groovy-jira/SKILL.md) — the JQL and field
+  conventions for the issue being reproduced.
+
+## When to use this skill
+
+**Use it for:**
+
+- Extracting a reproducer from a JIRA description, comment thread, or
+  attachment and running it against the current checkout.
+- Producing an evidence package (rev, JDK, command, output, exit
+  code, classification) that supports a triage finding or a
+  reassessment verdict.
+- Deciding whether a reporter's snippet can be reasonably adapted to
+  a runnable form *without* fabrication.
+
+**Don't use it for:**
+
+- Writing a fresh regression test where no reporter reproducer
+  exists — that's [`groovy-tests`](../groovy-tests/SKILL.md), and
+  the regression test is the contributor's design call, not a
+  reproduction of someone else's report.
+- Single-issue triage workflow as a whole — 
[`groovy-triage`](../groovy-triage/SKILL.md)
+  is the workflow; this skill is one of its steps.
+- Bulk processing — [`groovy-reassess`](../groovy-reassess/SKILL.md)
+  is the campaign layer; this skill handles one reproducer at a time.
+- Reporters' projects requiring a separate Gradle / Maven build (a
+  zip with its own `build.gradle`). Run those out of the Groovy
+  checkout if it matters; this skill focuses on reproducers that
+  run against the current Groovy build.
+
+## Read first
+
+- [`CONTRIBUTING.md`](../../../CONTRIBUTING.md) — the regression-test
+  shape that an adapted reproducer often takes.
+- [`AGENTS.md`](../../../AGENTS.md) — the "what *not* to do" list
+  applies here too: no fabricated reproducers, no hallucinated
+  identifiers, no scratch files left behind.
+- `build-logic/src/main/groovy/org.apache.groovy-tested.gradle` —
+  the source of the `groovy/grape/` `junit.network` exclusion;
+  reproducers under `groovy/grape/` need the same gate. See failure
+  mode 12 in [`groovy-tests`](../groovy-tests/SKILL.md).
+
+## Top failure modes
+
+These are the recurring mistakes when extracting and running JIRA
+reproducers:
+
+1. **Fabricating a reproducer when none exists.** "The reporter
+   described X happening; I'll write code that does X." That is the
+   agent doing the reporter's job. If the description is prose-only
+   and no attachment helps, classify `cannot-run-extraction` and
+   stop. The reporter's specific code is what makes a reproduction
+   trustworthy; an agent-written stand-in is a different exercise
+   (and a different verdict).
+2. **Skipping the comment thread.** Reporters frequently post a
+   simplified reproducer in a comment after the initial description.
+   Reading only the description misses it. Inventory every code
+   block in the description *and* every comment *and* every
+   attachment before picking a candidate.
+3. **Treating `@Test` adaptation as equivalent to a script run.**
+   Groovy scripts and class methods have different scoping (script
+   bindings vs. fields, implicit `main`, `def` vs. typed locals).
+   A reproducer that fails as a script may pass inside a `@Test` and
+   vice versa. If the original was a script, *also* run it as a
+   script before claiming the bug is gone — the `@Test` adaptation
+   exercises different paths.
+4. **Wrapping a JIRA snippet in `assert false` to "force a failure".**
+   That signals nothing about the underlying behaviour. The
+   adaptation should exercise the same path the reporter described
+   and let it fail (or not) naturally. `assert <reporter-expected>`
+   is fine; `assert false // bug` is theatre.
+5. **Skipping the original JDK.** A reproducer that passes on
+   JDK 21 may still fail on the JDK the reporter used (JDK 8, 11,
+   17). The strict claim is "passes on JDK X master." Without JDK
+   awareness, the verdict is incomplete — note the JDK in evidence,
+   and where the verdict matters, retry on the originally-affected
+   JDK if it is reasonably available locally.
+6. **Running `@Grab` and silently absorbing the dependency failure.**
+   Grape resolution that 404s or hits a dead repo should be a
+   recognised classification (`cannot-run-dependency`), not "test
+   passes" because the resolution exception was swallowed and the
+   body never ran. Check exit code *and* output for resolution
+   errors before classifying.
+7. **No timeout.** A buggy reproducer can hang (infinite loop,
+   deadlock, awaiting input). Without a timeout, one bad issue burns
+   hours. Default to a bounded run (60s is a reasonable starting
+   point; raise per-issue if the reporter notes long-running
+   behaviour) and classify as `timeout` if hit.
+8. **Capturing stdout but not stderr.** Many reproducers print the
+   bug indicator (stack traces, MOP errors, "expected X got Y") to
+   stderr. Capture both streams and surface both in the evidence.
+9. **Comparing run output without normalising line endings or
+   locale.** Same trap as [`groovy-tests`](../groovy-tests/SKILL.md)
+   failure mode 11. Output captured on Windows uses `\r\n`; locale
+   shifts number/date formatting. Normalise before string-compare,
+   or compare on parsed values.
+10. **Over-claiming "fixed" from a single-environment pass.** A
+    clean run on your laptop may be environment-luck, not a real
+    fix — locale, charset, default JDK, file-encoding defaults all
+    bite. Where the verdict is `passes`, qualify it with the
+    environment that produced the pass; don't generalise.
+11. **Discarding the working tree before recording the run.**
+    Capture `rev`, `jdk`, command, output, runtime, exit code
+    *before* reverting any test additions or cleaning the scratch
+    directory. The evidence is what gives the verdict its weight;
+    losing it means the work has to be redone.
+12. **Letting working-tree state leak between reproducers.** When
+    running many in sequence (the reassessment case), reset between
+    issues — `git stash --include-untracked && git clean -fd` on
+    a sacrificial branch, or a separate scratch directory per
+    issue. Crosstalk between reproducers (a file written by issue
+    A's reproducer that issue B picks up) corrupts verdicts in ways
+    that are hard to spot.
+13. **Polluting the local Grape cache across many issues.** A sweep
+    that runs many `@Grab`-using scripts can degrade
+    `~/.groovy/grapes/` (see GROOVY-12005). For a campaign,
+    consider a per-sweep Grape root via `-Dgrape.root=<scratch>` so
+    the user's everyday cache stays clean.
+
+## Reproducer shape taxonomy
+
+Most JIRA reproducers fall into one of these shapes. The handling
+recipe for each is the meat of this skill.
+
+**A. Complete runnable Groovy script** — a `.groovy` file body in
+the description or a comment, with imports and a top-level expression
+or `main`. Recipe: save to a scratch `.groovy`, build a current
+distribution (`./gradlew :installDist` on the relevant subproject),
+and run with the built `groovy` binary. Or, for many cases, adapt as
+a `@Test` per [`groovy-tests`](../groovy-tests/SKILL.md) and run
+targeted — but be aware of failure mode 3 (script vs `@Test`
+semantics).
+
+**B. `@Test`-shaped snippet** — already class-and-annotation shaped.
+Recipe: place under `src/test/groovy/bugs/Groovy<NNNN>.groovy` (or
+the subproject's `src/test/`) per [`groovy-tests`](../groovy-tests/SKILL.md);
+run targeted (`./gradlew :test --tests <FQN>`).
+
+**C. Inline fragment** — a few lines, not a complete script ("when I
+write `foo.bar()` I get NPE"). Recipe: wrap minimally — a `@Test`
+with the fragment as the body, or a script with `def` declarations
+for any referenced names. Where the wrap requires guessing types or
+context, lean toward `needs-info` rather than speculating.
+
+**D. Stack-trace-only** — a `.printStackTrace()` output, no code.
+Recipe: classify `cannot-run-extraction`. The stack trace is a *hint
+about the area*, not a reproducer. Don't construct code to "make
+that stack trace appear"; that's fabrication.
+
+**E. Prose-only** — natural-language description, no code. Recipe:
+`cannot-run-extraction`. Don't write code from prose.
+
+**F. Attachment** — `.groovy`, `.java`, `.zip` (project), `.txt`
+(log), `.gz` (heap dump, etc.). For `.groovy` / `.java` files,
+handle per shape A/B. For project zips, this is out of scope for
+this skill — flag as `needs-separate-workspace` and let the
+campaign / triage handle it. For logs and heap dumps, treat as
+hints (shape D).
+
+**G. `@Grab`-using script** — depends on resolution against Maven
+Central or other repos. Recipe: run as in A, but with network
+available; verify the resolution succeeded before interpreting the
+result. A pinned old version may no longer be in the configured
+repos (`cannot-run-dependency`).
+
+**H. Multi-file project** — a tarball or zip with its own
+`build.gradle` / `pom.xml`. Recipe: this skill's posture is "run
+against the Groovy checkout"; project-style reproducers run *with*
+their own build. Classify as `needs-separate-workspace` and surface;
+the user / campaign can spin up an isolated workspace if it matters.
+
+## Procedure
+
+For each reproducer:
+
+1. **Inventory.** Read the JIRA description, every comment, every
+   attachment. Note all code blocks (verbatim, with their location
+   — "description", "comment 3 by …", "attachment foo.groovy").
+   Note the reporter's claimed environment: Groovy version, JDK, OS.
+2. **Pick the candidate.** When multiple reproducers exist, prefer
+   the simplest *complete* one. Note the fallback chain — if the
+   simplest fails to adapt, the next one in line is the
+   reporter's original.
+3. **Classify the shape** per the taxonomy above. Output the shape
+   category as part of the evidence package.
+4. **Adapt without fabrication.**
+   - Shape A: copy verbatim to a scratch file.
+   - Shape B: place per [`groovy-tests`](../groovy-tests/SKILL.md).
+   - Shape C: wrap minimally; if the wrap requires speculation
+     (guessing a type, inventing a missing variable), stop and
+     classify `cannot-run-extraction` with a note about what was
+     missing.
+   - Shapes D, E: classify `cannot-run-extraction`; don't adapt.
+   - Shape F: per inner shape; for project zips, classify
+     `needs-separate-workspace`.
+   - Shape G: copy verbatim; flag for Grape-aware running.
+   - Shape H: classify `needs-separate-workspace`.
+5. **Build the current Groovy distribution** if the reproducer is a
+   script that needs the produced `groovy` binary. For `@Test`-shape
+   reproducers, the Gradle test invocation handles the build.
+6. **Run with bounded resources.** Timeout (60s default), capture
+   stdout + stderr + exit code + runtime. Record the command
+   verbatim.
+7. **Compare to the original failure pattern.** "Fails with the
+   reported exception type and a message containing the reported
+   substring" is `same-failure`. "Fails with something different" is
+   `different-failure`. "Doesn't fail" is `passes`. "Hangs past
+   timeout" is `timeout`. "Errors before exercising the path" is
+   `cannot-run-*`.
+8. **Record the evidence package** before doing anything else.
+9. **Reset the working tree** if you adapted as a `@Test` (the
+   added file must not leak to the next issue).
+
+## Run posture
+
+- **Timeout:** 60s default. Bump only when the reporter notes
+  longer-running behaviour, and record the bump in evidence.
+- **Network:** Grape needs it for shape G. Leave it on. Dependency
+  failures get the `cannot-run-dependency` classification; they are
+  not "fixed."
+- **Filesystem:** scratch directory per issue, under
+  `~/work/groovy-reassessment/<KEY>/` (or wherever the campaign
+  layout puts it — see [`groovy-reassess`](../groovy-reassess/SKILL.md)).
+  Don't write under the Groovy checkout.
+- **Working tree:** clean between reproducers. The
+  added-and-then-removed `@Test` is the most common leak source.
+- **Grape cache:** for a campaign with many `@Grab` reproducers,
+  consider `-Dgrape.root=<scratch>` so the user's
+  `~/.groovy/grapes/` stays clean.
+- **JDK selection:** record the JDK used. For verdicts where it
+  matters (`passes`, `fixed-on-master`), retry on the
+  originally-affected JDK via Gradle toolchains where reasonable.
+
+## Evidence package
+
+For each reproducer run, persist:
+
+- `description.md` — the JIRA description and the comments quoted
+  verbatim (so the verdict is auditable even if JIRA changes).
+- `reproducer.<ext>` — the adapted runnable form, or
+  `extraction-failed.md` if the shape didn't support adaptation.
+- `original.<ext>` — the literal source from JIRA, untouched, when
+  extracted.
+- `run.log` — stdout + stderr from the run, with the exact command
+  on the first line.
+- `verdict.json` — `{ "key": "GROOVY-NNNNN", "shape": "<A|B|…>",
+  "classification": "<one of: same-failure | different-failure |
+  passes | cannot-run-extraction | cannot-run-environment |
+  cannot-run-dependency | timeout | needs-separate-workspace>",
+  "rev": "<short-sha>", "jdk": "<vendor+version>",
+  "command": "<verbatim>", "runtime-ms": <int>,
+  "exit-code": <int>, "matched-original-failure": <bool>,
+  "notes": "<short>" }`.
+
+This package is what a committer needs to trust the verdict, and it
+is what [`groovy-reassess`](../groovy-reassess/SKILL.md) feeds into
+its report.
+
+## Validation checklist
+
+Before recording a verdict:
+
+- [ ] Every code block in the description, comments, and attachments
+      was inventoried — not just the description.
+- [ ] The chosen reproducer is verbatim from the report (or the
+      adaptation is mechanical, not speculative).
+- [ ] The shape is classified; if `cannot-run-*` or
+      `needs-separate-workspace`, no execution attempt was claimed.
+- [ ] Run was bounded by a timeout; the bound is recorded.
+- [ ] Both stdout and stderr were captured.
+- [ ] The exact command, revision, and JDK were recorded.
+- [ ] The classification distinguishes `same-failure` from
+      `different-failure` — the original failure pattern was
+      consulted, not just "did it throw."
+- [ ] For `@Grab` reproducers: dependency resolution succeeded
+      before any "passes" verdict was claimed.
+- [ ] For `passes`: the run environment is qualified; the verdict
+      doesn't over-claim "fixed" from a single environment.
+- [ ] Working tree was reset (no leftover scratch test class).
+- [ ] Evidence package was written before the next issue started.
+
+## References
+
+- [`CONTRIBUTING.md`](../../../CONTRIBUTING.md) — regression-test
+  shape that `@Test`-adapted reproducers fit into.
+- [`AGENTS.md`](../../../AGENTS.md) — the no-fabrication, no
+  drive-by, no scratch-files-in-tree principles applied here.
+- `.agents/skills/groovy-triage/SKILL.md` — single-issue caller of
+  this skill.
+- `.agents/skills/groovy-reassess/SKILL.md` — campaign-level caller
+  of this skill.
+- `.agents/skills/groovy-tests/SKILL.md` — placement and naming
+  when a reproducer is adapted as a `@Test`; the
+  `junit.network`-gating story for `groovy/grape/` reproducers.
+- `.agents/skills/groovy-jira/SKILL.md` — JIRA mechanics for the
+  issue around the reproducer.
diff --git a/AGENTS.md b/AGENTS.md
index 99c0144a94..362deca689 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -131,6 +131,8 @@ into the human-facing docs above.
 | [`groovy-fix-workflow`](.agents/skills/groovy-fix-workflow/SKILL.md) | 
Implementing a JIRA-tracked fix after triage — failing-test-first ordering, 
scope discipline, hand-back to a committer (no autonomous PR / JIRA comment / 
merge) |
 | [`groovy-internals`](.agents/skills/groovy-internals/SKILL.md) | Compiler 
and runtime work — parser, AST, type checker, transforms, class generation |
 | [`groovy-jira`](.agents/skills/groovy-jira/SKILL.md) | JIRA mechanics — JQL 
recipes, `Component/s` taxonomy, field ownership, and the "don't transition 
workflow states" rule for AI tooling |
+| [`groovy-reassess`](.agents/skills/groovy-reassess/SKILL.md) | Bulk 
reassessment of old JIRA issues — selection, per-issue reproduction, 
classification (`fixed-on-master` / `still-fails-*` / `cannot-run-*` / …), 
report and evidence-package hand-back; read-only against JIRA |
+| [`groovy-reproducer`](.agents/skills/groovy-reproducer/SKILL.md) | 
Extracting and running a JIRA-reported reproducer — shape classification, 
adaptation without fabrication, bounded run, deterministic evidence 
(rev/JDK/command/output) and an outcome classification |
 | [`groovy-tests`](.agents/skills/groovy-tests/SKILL.md) | Adding or modifying 
tests, including JIRA regression tests and executable AsciiDoc examples |
 | [`groovy-triage`](.agents/skills/groovy-triage/SKILL.md) | First-pass triage 
of JIRA issues and GitHub PRs — reproducing reports against `master`, finding 
duplicates, surfacing PR-readiness signals; advisory only, no committer actions 
|
 | [`groovysh`](.agents/skills/groovysh/SKILL.md) | Work in 
`subprojects/groovy-groovysh/` — REPL commands, JLine integration, vendored 
forks, terminal-aware test stack |

(groovy) branch master updated: AI readiness: add draft run reproducer and reassess skills

Reply via email to