This is an automated email from the ASF dual-hosted git repository.

cloud-fan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new a9e074906bf1 [SPARK-56865][INFRA] Suggest a minimal Fix Versions set 
when resolving JIRA
a9e074906bf1 is described below

commit a9e074906bf118417afad37c85068eed3b804d5c
Author: Wenchen Fan <[email protected]>
AuthorDate: Fri May 15 11:07:56 2026 +0800

    [SPARK-56865][INFRA] Suggest a minimal Fix Versions set when resolving JIRA
    
    ### What changes were proposed in this pull request?
    
    When `dev/merge_spark_pr.py` resolves a JIRA after merging a PR to multiple 
branches, it computes a default Fix Versions list from the merge targets. This 
PR rewrites that computation to produce the **minimal set of Fix Versions** 
that tells users which Spark releases will contain the commit, leveraging the 
Apache Spark Upstream-First backporting policy (cherry-picks flow `master -> 
branch-M.x -> branch-M.N`).
    
    Each merge target contributes a candidate version:
    - `master` -> the greatest unreleased `N.0.0`
    - `branch-M.x` -> that major's greatest unreleased `M.minor.0`
    - `branch-M.N` -> its greatest unreleased `M.N.patch`
    
    Two suppression rules then trim redundancy:
    1. **Drop master's `N.0.0`** when any `branch-M.x` is in the merge set: a 
cherry-pick to `branch-M.x` has already landed on master, so the next master 
release is implied.
    2. **Drop `branch-M.x`'s `minor.0`** when a sibling `branch-M.N` 
contributes `M.N.0`: an unreleased `M.N.0` means `branch-M.N` has not diverged 
from `branch-M.x` yet, so `M.N.0` alone covers both lineages.
    
    Sanity check against the current unreleased SPARK versions (`5.0.0, 4.3.0, 
4.2.0, 4.1.2, 4.0.3, 3.5.9`) plus the hypothetical "4.2.0 has released" 
scenarios raised in the review discussion:
    
    | Merge branches | Before | After |
    |---|---|---|
    | `master` | `[5.0.0]` | `[5.0.0]` |
    | `master`, `branch-4.x` | `[5.0.0, 4.3.0]` | `[4.3.0]` |
    | `master`, `branch-4.x`, `branch-4.2` (4.2.0 unreleased) | `[5.0.0, 
4.2.0]` | `[4.2.0]` |
    | `master`, `branch-4.x`, `branch-4.2` (4.2.0 released) | `[5.0.0, 4.2.1]` 
| `[4.3.0, 4.2.1]` |
    | `master`, `branch-4.2` (4.2.0 unreleased) | `[5.0.0, 4.2.0]` | `[5.0.0, 
4.2.0]` |
    | `master`, `branch-4.2` (4.2.0 released) | `[5.0.0, 4.2.1]` | `[5.0.0, 
4.2.1]` |
    
    The rules are parameterized on `M` and `N`, so they generalize to any 
major/minor pair as the project moves forward (e.g. post-5.0.0 `master + 
branch-5.x + branch-5.1` with `5.1.0` unreleased -> `[5.1.0]`; double-digit 
minors like `branch-4.11` work the same way).
    
    ### Why are the changes needed?
    
    The previous defaults forced committers to manually edit Fix Versions in 
multiple common cases:
    - `master + branch-M.x`: strip the extra `N.0.0`.
    - `master + branch-M.x + branch-M.N` (just-cut `branch-M.N`): the change 
becomes `[M.N.0]` rather than `[5.0.0, M.N.0]` or `[M.minor.0, M.N.0]`.
    
    Both are now produced as defaults that match the convention Spark 
committers apply by hand.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No. The committer prompt still lets the user override the defaults, so any 
committer who prefers the old behavior can type it in. Only the suggested 
default changes.
    
    ### How was this patch tested?
    
    - `python3 -m doctest dev/merge_spark_pr.py -v` -- 34 doctests pass, 
including 16 cases in `compute_merge_default_fix_versions` covering all six 
rows in the table plus the existing scenarios.
    - Manually traced future scenarios (post-4.3.0, post-5.0.0, mixed 
cross-major, double-digit `branch-4.11`) to confirm the rules generalize across 
`M` and `N`.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Claude Code (Claude Opus 4.7)
    
    Closes #55878 from cloud-fan/cloud-fan/merge-script-fix-versions.
    
    Authored-by: Wenchen Fan <[email protected]>
    Signed-off-by: Wenchen Fan <[email protected]>
---
 dev/merge_spark_pr.py | 163 +++++++++++++++++++++++++++++---------------------
 1 file changed, 95 insertions(+), 68 deletions(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index 0263b25700d3..795fde6d176e 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -25,9 +25,14 @@
 # absorb work for the next minors on major M. Maintenance branches branch-M.N 
are cut from
 # branch-M.x. Cherry-pick prompts sort by semver via semver_branch_rank 
(branch-M.x before branch-M.N).
 #
-# Fix Version defaults when resolving JIRA: master contributes the largest 
unreleased N.0.0.
-# Master + branch-M.x also defaults per listed integration branch to that 
major's unreleased Spark
-# minor line versions (multiple branch-M.x entries each add one; e.g. 6.0.0 + 
5.2.0 + 4.3.0).
+# Fix Version defaults when resolving JIRA aim for the minimal set of versions 
that tells users
+# which Spark releases will contain the commit, leveraging the Upstream-First 
backporting policy
+# (cherry-picks flow master -> branch-M.x -> branch-M.N). master contributes 
the greatest
+# unreleased N.0.0, but is omitted when any branch-M.x is in the merge set (a 
cherry-pick to
+# branch-M.x has already landed on master). Each branch-M.x contributes that 
major's greatest
+# unreleased minor.0, but is omitted when a sibling branch-M.N contributes 
M.N.0 (M.N.0 still
+# unreleased means branch-M.N is still a subset of branch-M.x). Patch merges 
on branch-M.N use
+# unreleased versions starting with M.N.
 # This utility assumes you already have a local Spark git folder and that you
 # have added remotes corresponding to both (i) the github apache Spark
 # mirror and (ii) the apache git repo.
@@ -151,36 +156,6 @@ def _sort_version_names_lex_desc(names):
     return sorted(names, reverse=True)
 
 
-def _dedupe_fix_versions_adjacent_major_zero(default_fix_versions):
-    """Preserve order; drop duplicate names, then drop V when V is M.minor.0 
and
-    predecessor (minor-1).0.0 exists in the list.
-
-    >>> _dedupe_fix_versions_adjacent_major_zero(["6.0.0", "4.3.0", "4.3.0"])
-    ['6.0.0', '4.3.0']
-    >>> _dedupe_fix_versions_adjacent_major_zero(["2.0.0", "1.1.0", "1.0.0"])
-    ['2.0.0', '1.0.0']
-    >>> _dedupe_fix_versions_adjacent_major_zero(["1.0.0", "2.0.0"])
-    ['1.0.0', '2.0.0']
-    """
-    out = list(dict.fromkeys(default_fix_versions))
-    for v in list(out):
-        major, minor, patch = v.split(".")
-        if patch == "0":
-            previous = "%s.%s.%s" % (major, int(minor) - 1, 0)
-            if previous in out:
-                out = list(filter(lambda x: x != v, out))
-    return out
-
-
-def _append_max_fix_or_warn(defaults, warnings, candidates, warn_if_none):
-    """Append _semver_max_version(candidates) when defined; else optionally 
append warn_if_none."""
-    chosen = _semver_max_version(candidates)
-    if chosen:
-        defaults.append(chosen)
-    elif warn_if_none is not None:
-        warnings.append(warn_if_none)
-
-
 def compute_merge_default_fix_versions(merge_branches, 
unreleased_version_names):
     """
     Build suggested SPARK Fix Version names from merged git refs (no JIRA I/O).
@@ -189,22 +164,29 @@ def compute_merge_default_fix_versions(merge_branches, 
unreleased_version_names)
 
     Returns (default_version_names, warning_messages).
 
-    Master always suggests the greatest unreleased N.0.0. Each branch-M.x with 
master also adds the
-    greatest unreleased Spark M.*.* Fix Version for that major (multiple 
branch-M.x entries each
-    contribute). Patch merges on branch-M.N use unreleased versions starting 
with M.N.
+    The output is the minimal set of Fix Versions that tells users which Spark 
releases will
+    contain the commit, leveraging the Upstream-First backporting policy 
(cherry-picks flow
+    master -> branch-M.x -> branch-M.N):
+      - master contributes the greatest unreleased N.0.0;
+      - branch-M.x with master contributes that major's greatest unreleased 
minor.0;
+      - branch-M.N contributes its greatest unreleased M.N.patch.
+    Redundant entries are then suppressed: master's N.0.0 is dropped when any 
branch-M.x is in
+    the merge set (a cherry-pick to branch-M.x has already landed on master); 
branch-M.x's
+    minor.0 is dropped when a branch-M.N for the same major contributes M.N.0 
(M.N.0 still
+    unreleased means branch-M.N is still a subset of branch-M.x).
 
     >>> compute_merge_default_fix_versions(["master"], ["4.3.0", "5.0.0", 
"6.0.0"])[0]
     ['6.0.0']
 
     >>> compute_merge_default_fix_versions(
     ...     ["master", "branch-4.x"], ["6.0.0", "5.0.0", "4.3.0", "4.2.1"])[0]
-    ['6.0.0', '4.3.0']
+    ['4.3.0']
 
     >>> compute_merge_default_fix_versions(
     ...     ["master", "branch-5.x"],
     ...     ["7.0.0", "6.0.0", "5.2.0", "5.1.2", "5.1.1", "5.0.1"],
     ... )[0]
-    ['7.0.0', '5.2.0']
+    ['5.2.0']
 
     >>> compute_merge_default_fix_versions(
     ...     ["master", "branch-3.5"],
@@ -232,58 +214,103 @@ def compute_merge_default_fix_versions(merge_branches, 
unreleased_version_names)
     >>> compute_merge_default_fix_versions(
     ...     ["master", "branch-5.x", "branch-4.x"], ["5.2.0", "4.3.0", "6.0.0"]
     ... )[0]
-    ['6.0.0', '5.2.0', '4.3.0']
+    ['5.2.0', '4.3.0']
 
     >>> compute_merge_default_fix_versions(
     ...     ["master", "branch-4.x", "branch-4.x"], ["6.0.0", "4.3.0"]
     ... )[0]
-    ['6.0.0', '4.3.0']
+    ['4.3.0']
+
+    >>> compute_merge_default_fix_versions(
+    ...     ["master", "branch-4.x", "branch-4.2"], ["5.0.0", "4.3.0", "4.2.0"]
+    ... )[0]
+    ['4.2.0']
+
+    >>> compute_merge_default_fix_versions(
+    ...     ["master", "branch-4.x", "branch-4.2"], ["5.0.0", "4.3.0", "4.2.1"]
+    ... )[0]
+    ['4.3.0', '4.2.1']
+
+    >>> compute_merge_default_fix_versions(
+    ...     ["master", "branch-4.2"], ["5.0.0", "4.2.0"]
+    ... )[0]
+    ['5.0.0', '4.2.0']
+
+    >>> compute_merge_default_fix_versions(
+    ...     ["master", "branch-4.2"], ["5.0.0", "4.2.1"]
+    ... )[0]
+    ['5.0.0', '4.2.1']
     """
     names = _sort_version_names_lex_desc(list(unreleased_version_names))
-    merged_set = set(merge_branches)
+    has_integration_branch = any(
+        _integration_major_from_branch(b) is not None for b in merge_branches
+    )
 
-    defaults = []
+    # Collect each merge branch's candidate Fix Version. We track (branch, 
version) so we can
+    # later suppress redundant entries (master, branch-M.x) based on which 
branches contributed.
+    contributions = []
     warnings = []
 
     for b in merge_branches:
         if b == "master":
             majors = [n for n in names if re.match(r"^\d+\.0\.0$", n)]
-            _append_max_fix_or_warn(
-                defaults,
-                warnings,
-                majors,
-                "No unreleased N.0.0 Fix Version found in JIRA for master; "
-                "enter comma-separated Fix Version(s) manually when prompted.",
-            )
+            chosen = _semver_max_version(majors)
+            if chosen:
+                contributions.append((b, chosen))
+            else:
+                warnings.append(
+                    "No unreleased N.0.0 Fix Version found in JIRA for master; 
"
+                    "enter comma-separated Fix Version(s) manually when 
prompted."
+                )
             continue
         line_major = _integration_major_from_branch(b)
         if line_major is not None:
-            if "master" in merged_set:
-                line_versions = [n for n in names if 
re.match(r"^%s\.\d+\.\d+$" % line_major, n)]
-                _append_max_fix_or_warn(
-                    defaults,
-                    warnings,
-                    line_versions,
-                    (
-                        "Could not infer an unreleased Spark %s 
(minor.maintenance) fix version "
-                        "for branch-%s.x + master merge; enter version(s) 
manually when prompted."
-                        % (line_major, line_major)
-                    ),
+            if "master" not in merge_branches:
+                continue
+            line_versions = [n for n in names if re.match(r"^%s\.\d+\.\d+$" % 
line_major, n)]
+            chosen = _semver_max_version(line_versions)
+            if chosen:
+                contributions.append((b, chosen))
+            else:
+                warnings.append(
+                    "Could not infer an unreleased Spark %s 
(minor.maintenance) fix version "
+                    "for branch-%s.x + master merge; enter version(s) manually 
when prompted."
+                    % (line_major, line_major)
                 )
             continue
         prefix = b.replace("branch-", "")
         found_versions = [n for n in names if n.startswith(prefix)]
-        _append_max_fix_or_warn(
-            defaults,
-            warnings,
-            found_versions,
-            (
+        chosen = _semver_max_version(found_versions)
+        if chosen:
+            contributions.append((b, chosen))
+        else:
+            warnings.append(
                 "Target version for %s is not found on JIRA, it may be 
archived or "
                 "not created. Skipping it." % b
-            ),
-        )
+            )
 
-    return _dedupe_fix_versions_adjacent_major_zero(defaults), warnings
+    # Majors where a release branch's M.N.0 contribution makes the sibling 
branch-M.x's
+    # minor.0 redundant: an unreleased M.N.0 means branch-M.N has not diverged 
from branch-M.x
+    # yet, so anything that's on branch-M.N is also on branch-M.x.
+    suppress_integration_majors = set()
+    for b, v in contributions:
+        if b == "master" or _integration_major_from_branch(b) is not None:
+            continue
+        m = re.match(r"^(\d+)\.\d+\.0$", v)
+        if m:
+            suppress_integration_majors.add(int(m.group(1)))
+
+    def keep(item):
+        b, _ = item
+        if b == "master":
+            return not has_integration_branch
+        major = _integration_major_from_branch(b)
+        if major is not None:
+            return major not in suppress_integration_majors
+        return True
+
+    filtered = [item for item in contributions if keep(item)]
+    return list(dict.fromkeys(v for _, v in filtered)), warnings
 
 
 def print_error(msg):


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to