This is an automated email from the ASF dual-hosted git repository.
viirya pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new a9d24683e474 [SPARK-57002][INFRA] Enforce Upstream-First policy in
merge_spark_pr.py cherry-pick prompts
a9d24683e474 is described below
commit a9d24683e4740506e24f99485c4ae8bdadd6496f
Author: Liang-Chi Hsieh <[email protected]>
AuthorDate: Fri May 22 15:54:07 2026 -0700
[SPARK-57002][INFRA] Enforce Upstream-First policy in merge_spark_pr.py
cherry-pick prompts
### What changes were proposed in this pull request?
When a committer manually types `branch-M.N` at the cherry-pick prompt
while `branch-M.x` exists and has not yet received the commit, the script now
surfaces the Upstream-First policy and offers to pick into both branches in one
step (the policy-compliant default). The committer can still pick only
`branch-M.N` if the commit is genuinely a `branch-M.N`-only maintenance bugfix,
or abort.
Implementation notes:
- Split `cherry_pick` into `_do_cherry_pick` (fetch + cherry-pick + push)
and `cherry_pick` (prompt + policy check). The policy wrapper returns a list of
refs so the main loop can advance its remaining-branches list correctly when
one prompt consumes two branches.
- Replace the `branch_iter` iterator with a mutable `remaining_branches`
list in the main cherry-pick loop, so picks consumed by the two-branch path are
accounted for in the next prompt's default.
- Add an `already_picked` parameter to `cherry_pick` so the policy check
skips its prompt when `branch-M.x` is in the set of refs already touched this
session (e.g. when the PR was merged into `branch-M.x` and the loop is now
picking into `branch-M.N`).
### Why are the changes needed?
The Upstream-First backporting policy (documented in the header comment of
`dev/merge_spark_pr.py`) requires non-bugfix commits to flow through
`branch-M.x` before reaching `branch-M.N`. The merge script already orders
`branch-M.x` ahead of `branch-M.N` as the cherry-pick default. However, when a
committer types `branch-M.N` at the prompt, the script silently proceeds and
`branch-M.x` is never revisited.
This has led to commits landing on `branch-4.2` but missing `branch-4.x`.
Six such commits observed on the current branches (as of 2026-05-22):
- SPARK-56700 (#55651)
- SPARK-56676 (#55623)
- SPARK-56838 (#55836)
- SPARK-56650 (#55589)
- SPARK-56856 (#55969)
- SPARK-56977 (#56023)
All six landed on master and `branch-4.2` but were not cherry-picked to
`branch-4.x`, requiring follow-up backports.
### Does this PR introduce _any_ user-facing change?
Yes for committers using `dev/merge_spark_pr.py`. When the typed
cherry-pick target is `branch-M.N` and `branch-M.x` exists and is not yet
picked, an additional prompt asks whether to pick into both. Accepting the
default ("both") preserves prior behavior plus an extra cherry-pick to
`branch-M.x`.
No change when the committer accepts the default `branch-M.x` target, or
when picking into `branch-M.x` first and `branch-M.N` second (the typical
policy-compliant flow).
### How was this patch tested?
- `python3 -m doctest dev/merge_spark_pr.py` passes (34/34, all
pre-existing tests — none cover the new policy logic).
- New `cherry_pick` policy logic was reviewed for behavior but **not
exercised end-to-end**: actually running `merge_spark_pr.py` requires committer
privileges and a live open PR to merge. Edge cases were traced by reading the
code (PR target = master with manual branch-M.N entry; PR target = branch-M.x
with default branch-M.N pick; multiple iterations after a two-branch pick).
- Reviewers familiar with the merge flow are encouraged to verify behavior
on first real use, especially the abort path and the interaction with manual
conflict resolution inside `_do_cherry_pick`.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)
Closes #56058 from viirya/infra-merge-script-upstream-first-policy.
Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
---
dev/merge_spark_pr.py | 113 +++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 102 insertions(+), 11 deletions(-)
diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index b630e13b968c..6e5da30f94b9 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -470,11 +470,8 @@ def merge_pr(pr_num, target_ref, title, body,
pr_repo_desc, pr_author, co_author
return merge_hash
-def cherry_pick(pr_num, merge_hash, default_branch):
- pick_ref = bold_input("Enter a branch name [%s]: " % default_branch)
- if pick_ref == "":
- pick_ref = default_branch
-
+def _do_cherry_pick(pr_num, merge_hash, pick_ref):
+ """Cherry-pick `merge_hash` onto `pick_ref` and push. Returns the pushed
ref."""
pick_branch_name = "%s_PICK_PR_%s_%s" % (BRANCH_PREFIX, pr_num,
pick_ref.upper())
run_cmd("git fetch %s %s:%s" % (PUSH_REMOTE_NAME, pick_ref,
pick_branch_name))
@@ -495,7 +492,6 @@ def cherry_pick(pr_num, merge_hash, default_branch):
try:
run_cmd("git push %s %s:%s" % (PUSH_REMOTE_NAME, pick_branch_name,
pick_ref))
except Exception as e:
- clean_up()
fail("Exception while pushing: %s" % e)
pick_hash = run_cmd("git rev-parse %s" % pick_branch_name)[:8]
@@ -506,6 +502,85 @@ def cherry_pick(pr_num, merge_hash, default_branch):
return pick_ref
+def _upstream_first_sibling(target_ref, pick_ref, branch_names,
already_picked):
+ """Return the sibling branch-M.x if Upstream-First should prompt, else
None.
+
+ The policy only applies when the PR was merged into master: that's the
only case
+ where the committer can type branch-M.N at the cherry-pick prompt and
bypass the
+ rolling branch-M.x. When the PR was opened against branch-M.x the merge
itself
+ lands there (nothing to bypass), and when it was opened against branch-M.N
the
+ author already chose per-branch scope.
+
+ >>> _upstream_first_sibling("master", "branch-4.2", ["branch-4.x",
"branch-4.2"], ())
+ 'branch-4.x'
+ >>> _upstream_first_sibling("master", "branch-4.2", ["branch-4.x",
"branch-4.2"],
+ ... ("branch-4.x",))
+ >>> _upstream_first_sibling("master", "branch-4.x", ["branch-4.x"], ())
+ >>> _upstream_first_sibling("master", "branch-4.99", ["branch-4.2"], ())
+ >>> _upstream_first_sibling("branch-4.x", "branch-4.2", ["branch-4.x",
"branch-4.2"], ())
+ >>> _upstream_first_sibling("branch-4.2", "branch-3.5", ["branch-4.x",
"branch-3.5"], ())
+ """
+ if target_ref != "master":
+ return None
+ m = re.match(r"^branch-(\d+)\.(\d+)$", pick_ref)
+ if not m:
+ return None
+ candidate = "branch-%s.x" % m.group(1)
+ if candidate in branch_names and candidate not in already_picked:
+ return candidate
+ return None
+
+
+def cherry_pick(pr_num, merge_hash, default_branch, branch_names, target_ref,
already_picked=()):
+ """Prompt for a target branch and cherry-pick `merge_hash` onto it.
+
+ Enforces the Upstream-First policy (see header comment) via
+ `_upstream_first_sibling`: when the PR was merged into master and the
committer
+ types a branch-M.N target while branch-M.x is also a known release branch
AND
+ has not already received this commit, prompt to confirm whether to pick
into
+ BOTH (the policy-compliant default) or branch-M.N only (treated as a
+ maintenance-only bugfix). Returns the list of refs actually picked into, so
+ the main loop can advance its remaining-branches list correctly.
+ """
+ pick_ref = bold_input("Enter a branch name [%s]: " % default_branch)
+ if pick_ref == "":
+ pick_ref = default_branch
+
+ sibling_x = _upstream_first_sibling(target_ref, pick_ref, branch_names,
already_picked)
+ if sibling_x is not None:
+ print()
+ print("=" * 80)
+ print(
+ "Upstream-First policy: non-bugfix commits on %s should also land
on %s."
+ % (pick_ref, sibling_x)
+ )
+ print(
+ "If this is a %s-only maintenance bugfix, you may pick %s alone."
% (pick_ref, pick_ref)
+ )
+ print("Otherwise, pick both (%s first, then %s)." % (sibling_x,
pick_ref))
+ print("=" * 80)
+ choice = (
+ bold_input(
+ "Pick into [b]oth %s + %s / [o]nly %s / [a]bort (default:
both): "
+ % (sibling_x, pick_ref, pick_ref)
+ )
+ .strip()
+ .lower()
+ )
+ if choice in ("", "b", "both"):
+ picked_x = _do_cherry_pick(pr_num, merge_hash, sibling_x)
+ picked_n = _do_cherry_pick(pr_num, merge_hash, pick_ref)
+ return [picked_x, picked_n]
+ elif choice in ("o", "only"):
+ return [_do_cherry_pick(pr_num, merge_hash, pick_ref)]
+ elif choice in ("a", "abort"):
+ fail("Aborted by user at Upstream-First policy prompt.")
+ else:
+ fail("Unrecognized choice %r; aborting." % choice)
+
+ return [_do_cherry_pick(pr_num, merge_hash, pick_ref)]
+
+
def print_jira_issue_summary(issue):
summary = "Summary\t\t%s\n" % issue.fields.summary
assignee = issue.fields.assignee
@@ -832,7 +907,6 @@ def main():
branches = get_json("%s/branches" % GITHUB_API_BASE)
branch_names = list(filter(lambda x: x.startswith("branch-"), [x["name"]
for x in branches]))
branch_names = sorted(branch_names, key=semver_branch_rank, reverse=True)
- branch_iter = iter(branch_names)
if len(sys.argv) == 1:
pr_num = bold_input("Which pull request would you like to merge? (e.g.
34): ")
@@ -942,7 +1016,8 @@ def main():
fail("Couldn't find any merge commit for #%s, you may need to
update HEAD." % pr_num)
print("Found commit %s:\n%s" % (merge_hash, message))
- cherry_pick(pr_num, merge_hash, next(branch_iter, branch_names[0]))
+ default = branch_names[0]
+ cherry_pick(pr_num, merge_hash, default, branch_names, target_ref,
already_picked=())
sys.exit(0)
if not bool(pr["mergeable"]):
@@ -976,11 +1051,27 @@ def main():
print("PR #%s is still open after push; closing it explicitly." %
pr_num)
close_pr(pr_num)
+ # Walk a mutable remaining-branches list so the next default correctly
skips any
+ # branches already picked, including branches consumed by the
Upstream-First two-branch
+ # path inside cherry_pick (e.g. picking branch-M.x + branch-M.N in a
single prompt).
+ # merged_refs doubles as the already_picked set passed to cherry_pick: it
starts with
+ # target_ref (the merge sink, never to be re-picked) and grows with every
cherry-pick.
+ remaining_branches = [b for b in branch_names if b != target_ref]
pick_prompt = "Would you like to pick %s into another branch?" % merge_hash
while bold_input("\n%s (y/N): " % pick_prompt).lower() == "y":
- merged_refs = merged_refs + [
- cherry_pick(pr_num, merge_hash, next(branch_iter, branch_names[0]))
- ]
+ default = remaining_branches[0] if remaining_branches else
branch_names[0]
+ picked = cherry_pick(
+ pr_num,
+ merge_hash,
+ default,
+ branch_names,
+ target_ref,
+ already_picked=tuple(merged_refs),
+ )
+ merged_refs = merged_refs + picked
+ for b in picked:
+ if b in remaining_branches:
+ remaining_branches.remove(b)
if asf_jira is not None:
continue_maybe("Would you like to update an associated JIRA?")
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]