PR #23410 opened by michaelni
URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/23410
Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/23410.patch
Add a helper that maps forge pull request head refs (refs/remotes/fforge/pr/*)
to the commits they correspond to on a base branch (origin/master).
Because pull requests are integrated by rebasing/applying their patches,
the resulting hashes differ from the PR branch, so matching is done by
git patch-id with an exact subject line and plain SHA as fallbacks. The
merge status is derived from the PR tip commit.
Co-Authored-by: AI
# Output sample
```
fforge/pr/23257 [merged] tip d39218ba40c8 -> 1152139b4898 (patch-id)
avcodec/cook: bound subpacket channel sum against channel count
fforge/pr/23258 [ open ] tip 188a40ecceb5 -> (unmatched)
avfilter/dnn: implement batching and dynamic shapes for Torch backend
1624f5aba6d5 -> (unmatched)
avfilter/dnn: implement persistent input buffer for torch backend
fforge/pr/23259 [ open ] tip 61725754647b -> (unmatched) rtsp:
send RTCP RR to unicast source by default
69b7127e2af9 -> (unmatched) rtsp:
mark URL as writable to allow RTCP RRs
c7dc586fdfbe -> (unmatched) Revert
"avformat/udp: modify the not write-only to read-only mode."
fforge/pr/23260 [ open ] tip 59742c632e3c -> (unmatched)
avformat/dashdec: fix unsigned integer overflow in segment number calculation
fforge/pr/23261 [merged] tip bb5c461a4732 -> bb5c461a4732 (patch-id)
avfilter/vf_libplacebo: setup pl_vulkan_queue.flags on import params
fforge/pr/23262 [ open ] tip 470f1accc04d -> (unmatched)
avformat/hlsenc: add EXT-X-DISCONTINUITY-SEQUENCE support
fforge/pr/23263 [ open ] tip 6ca42bf4998a -> (unmatched) Media
over QUIC (MoQ) support for FFmpeg using fragmented mp4
fforge/pr/23264 [merged] tip 95fe0658d7b3 -> 95fe0658d7b3 (patch-id)
avformat/mov: don't abort on unsupported or invalid chnl boxes
fforge/pr/23265 [merged] tip a56af32a0bc5 -> 7a2424eb43e6 (patch-id)
avcodec/apv_decode: avoid using apv_cbc
fforge/pr/23266 [merged] tip 1891772309c0 -> 6d8f7882ae6e (patch-id)
avcodec/adpcm: require block_align to be a multiple of channels in ADPCM_PSXC
init
fforge/pr/23267 [merged] tip a371c1b37ca2 -> de261b9bb2c6 (patch-id)
tests/checkasm/crc: use libavutil memory allocation helpers
686ef7eb3d77 -> 224659360a8b (patch-id)
tests/checkasm/crc: retain offset values between calls
fforge/pr/23268 [merged] tip 4d63e3dd4c64 -> 4d63e3dd4c64 (patch-id)
vulkan_ffv1: add Bayer encoder
fforge/pr/23269 [merged] tip f13b3720219f -> aaac0989e6b4 (patch-id)
avformat/mxfdec: Remove unneeded check
fforge/pr/23271 [merged] tip bc9e8dddc603 -> c7e0bac050a6 (patch-id)
avformat/matroskadec: bound TRACKENTRY parsing by max_streams
fforge/pr/23272 [merged] tip cf812f8bb316 -> d9e2239f3c5e (subject )
swscale/aarch64/yuv2rgb_neon: 2 lines at a time, yuva420p
64b7fc64aabe -> 4b6f7c2a05f4 (patch-id)
swscale/aarch64/yuv2rgb_neon: 2 lines at a time, rgb16
2aa06225f4a8 -> dad212060c77 (patch-id)
swscale/aarch64/yuv2rgb_neon: 2 lines at a time, gbrp
d829b3aa7765 -> 4bfe7efd0c3f (subject )
swscale/aarch64/yuv2rgb_neon: 2 lines at a time, packed RGB
dc9b46d4ebc9 -> 11b1721b11be (subject )
swscale/aarch64/yuv2rgb_neon: reorder params, unify signature
d6c5bb70f7ee -> 8dbc7299509f (patch-id)
swscale/aarch64/yuv2rgb_neon: name registers
82bfa79e0d8f -> e0fa6412408f (patch-id)
swscale/aarch64/yuv2rgb_neon: chroma-preserve compute_rgb
fforge/pr/23273 [ open ] tip 55942edecbf7 -> (unmatched)
avcodec/vvc/ps: reject mismatched slice entry point count
```
>From 0017a6fe6c68f70a1bbe5fcf6475e485ffba3722 Mon Sep 17 00:00:00 2001
From: Michael Niedermayer <[email protected]>
Date: Mon, 8 Jun 2026 15:00:10 +0200
Subject: [PATCH] tools: add match_prs_to_master.py
Add a helper that maps forge pull request head refs (refs/remotes/fforge/pr/*)
to the commits they correspond to on a base branch (origin/master).
Because pull requests are integrated by rebasing/applying their patches,
the resulting hashes differ from the PR branch, so matching is done by
git patch-id with an exact subject line and plain SHA as fallbacks. The
merge status is derived from the PR tip commit.
Co-Authored-by: AI
---
tools/match_prs_to_master.py | 227 +++++++++++++++++++++++++++++++++++
1 file changed, 227 insertions(+)
create mode 100755 tools/match_prs_to_master.py
diff --git a/tools/match_prs_to_master.py b/tools/match_prs_to_master.py
new file mode 100755
index 0000000000..cc9ddf68e7
--- /dev/null
+++ b/tools/match_prs_to_master.py
@@ -0,0 +1,227 @@
+#!/usr/bin/env python3
+"""Match commits in a base branch (origin/master) to forge PR head refs.
+
+FFmpeg integrates pull requests by rebasing/applying the patches, so the
+commit hashes on master differ from the ones on the PR branch (a
+Signed-off-by line is usually appended too). This means a plain SHA
+comparison finds almost nothing. Instead we match by:
+
+ 1. git patch-id (the hash of the diff, stable across rebase) -- primary,
+ 2. the exact commit subject (first line) -- fallback, used only when it
+ maps to a single master commit, so short/ambiguous subjects do not
+ create false positives,
+ 3. plain SHA / ancestry, for PRs merged without a rebase.
+
+For every PR ref it prints the PR tip commit and the master commit it maps
+to (if any), plus a status. The status is decided by the PR *tip* commit,
+because a merged PR's final patch lands on master while the intermediate
+commits a PR head inherits from a stale fork base are not a reliable
+signal:
+ merged - the PR tip was found on master
+ partial - the tip was not found, but (with --verbose) some commit was
+ open - nothing was found on master
+
+With --verbose every PR commit (capped by --max-commits) is matched and
+listed too, which is useful for spotting partially merged PRs.
+
+Note: this matches by content, not forge metadata, so it cannot tell an
+"open" PR apart from one that was closed without merging.
+
+Usage:
+ ./match_prs_to_master.py [--base origin/master]
+ [--pr-glob 'refs/remotes/fforge/pr/*']
+ [--since '2023-01-01'] # master window start
+ [--limit N] # only first N PRs (debug)
+ [--verbose] [--max-commits N]
+ [--format tsv|report]
+"""
+
+import argparse
+import subprocess
+import sys
+from collections import defaultdict
+
+
+def git(*args, check=True):
+ return subprocess.run(["git", *args], capture_output=True, text=True,
+ check=check).stdout
+
+
+def git_lines(*args):
+ out = git(*args)
+ return out.splitlines() if out else []
+
+
+def build_master_index(base, since):
+ """Return (patchid -> sha, subject -> [sha], set_of_shas) for base."""
+ rng_args = ["--no-merges"]
+ if since:
+ rng_args += ["--since", since]
+
+ # subjects + shas
+ subj_of = {}
+ shas = []
+ for line in git_lines("log", *rng_args, "--format=%H\x1f%s", base):
+ sha, _, subj = line.partition("\x1f")
+ subj_of[sha] = subj
+ shas.append(sha)
+ sha_set = set(shas)
+
+ subject_to_shas = defaultdict(list)
+ for sha, subj in subj_of.items():
+ subject_to_shas[subj].append(sha)
+
+ # patch-ids: `git log -p | git patch-id` yields "<patchid> <commitsha>"
+ patchid_to_sha = {}
+ p1 = subprocess.Popen(["git", "log", *rng_args, "-p", base],
+ stdout=subprocess.PIPE)
+ p2 = subprocess.Popen(["git", "patch-id", "--stable"],
+ stdin=p1.stdout, stdout=subprocess.PIPE, text=True)
+ p1.stdout.close()
+ for line in p2.stdout:
+ parts = line.split()
+ if len(parts) == 2:
+ patchid, sha = parts
+ # keep the first (newest) occurrence
+ patchid_to_sha.setdefault(patchid, sha)
+ p2.wait()
+ p1.wait()
+
+ return patchid_to_sha, subject_to_shas, subj_of, sha_set
+
+
+def pr_commits(pr_ref, base, max_commits):
+ """Commits reachable from the PR head but not from base (newest first).
+
+ Capped: PR heads often live on stale forks that carry many unrelated
+ commits, so we only look at the most recent max_commits of them.
+ """
+ args = ["rev-list", "--no-merges"]
+ if max_commits:
+ args += ["-n", str(max_commits)]
+ return git_lines(*args, pr_ref, "^" + base)
+
+
+def patch_id_of(rev):
+ """patch-id of a single commit, or None."""
+ p1 = subprocess.Popen(["git", "show", rev], stdout=subprocess.PIPE)
+ p2 = subprocess.Popen(["git", "patch-id", "--stable"],
+ stdin=p1.stdout, stdout=subprocess.PIPE, text=True)
+ p1.stdout.close()
+ out = p2.communicate()[0]
+ p1.wait()
+ parts = out.split()
+ return parts[0] if len(parts) >= 1 else None
+
+
+def match_commit(sha, subj, patchid_to_sha, subject_to_shas, sha_set):
+ """Return (master_sha, method) or (None, None)."""
+ pid = patch_id_of(sha)
+ if pid and pid in patchid_to_sha:
+ return patchid_to_sha[pid], "patch-id"
+ if sha in sha_set:
+ return sha, "sha"
+ cand = subject_to_shas.get(subj, [])
+ if len(cand) == 1:
+ return cand[0], "subject"
+ return None, None
+
+
+def auto_since(pr_glob, margin_days=30):
+ """Earliest PR head commit date minus a margin, to bound master
indexing."""
+ dates = git_lines("for-each-ref", "--format=%(committerdate:unix)",
pr_glob)
+ dates = [int(d) for d in dates if d.strip()]
+ if not dates:
+ return None
+ return f"@{min(dates) - margin_days * 86400}"
+
+
+def main():
+ ap = argparse.ArgumentParser()
+ ap.add_argument("--base", default="origin/master")
+ ap.add_argument("--pr-glob", default="refs/remotes/fforge/pr/*")
+ ap.add_argument("--since", default=None,
+ help="master window start (default: earliest PR date -
30d)")
+ ap.add_argument("--limit", type=int, default=0)
+ ap.add_argument("--format", choices=["tsv", "report"], default="report")
+ ap.add_argument("--verbose", action="store_true",
+ help="match every PR commit, not just the tip")
+ ap.add_argument("--max-commits", type=int, default=50,
+ help="cap on PR commits inspected in --verbose mode")
+ args = ap.parse_args()
+
+ since = args.since or auto_since(args.pr_glob)
+ print(f"# indexing {args.base} since {since} ...", file=sys.stderr)
+ patchid_to_sha, subject_to_shas, subj_of, sha_set = \
+ build_master_index(args.base, since)
+ print(f"# indexed {len(sha_set)} master commits, "
+ f"{len(patchid_to_sha)} patch-ids", file=sys.stderr)
+
+ pr_refs = git_lines("for-each-ref", "--format=%(refname:short)",
args.pr_glob)
+ # sort numerically by trailing PR number when possible
+ def prnum(r):
+ tail = r.rsplit("/", 1)[-1]
+ return int(tail) if tail.isdigit() else -1
+ pr_refs.sort(key=prnum)
+ if args.limit:
+ pr_refs = pr_refs[:args.limit]
+
+ if args.format == "tsv":
+ print("pr_ref\tstatus\ttip_commit\tmaster_commit\tmethod\tsubject")
+
+ def subject_of(sha):
+ return subj_of.get(sha) or git("show", "-s", "--format=%s",
sha).strip()
+
+ counts = defaultdict(int)
+ for pr_ref in pr_refs:
+ head = git("rev-parse", pr_ref).strip()
+ head_subj = subject_of(head)
+
+ # Status is decided by the PR tip: when a PR is merged, its last
+ # patch lands on master. Intermediate commits from a stale fork base
+ # are not a reliable signal.
+ tip_master, tip_method = match_commit(
+ head, head_subj, patchid_to_sha, subject_to_shas, sha_set)
+
+ rows = [(head, tip_master, tip_method, head_subj)] # tip first
+
+ if args.verbose:
+ for sha in pr_commits(pr_ref, args.base, args.max_commits):
+ if sha == head:
+ continue
+ subj = subject_of(sha)
+ m, meth = match_commit(sha, subj, patchid_to_sha,
+ subject_to_shas, sha_set)
+ rows.append((sha, m, meth, subj))
+
+ matched = sum(1 for r in rows if r[1])
+ if tip_master:
+ status = "merged"
+ elif matched:
+ status = "part"
+ else:
+ status = "open"
+ counts[status] += 1
+
+ if args.format == "tsv":
+ for pr_sha, master, method, subj in rows:
+ print(f"{pr_ref}\t{status}\t{pr_sha[:12]}\t"
+ f"{(master[:12] if master else '-')}\t"
+ f"{method or '-'}\t{subj}")
+ else:
+ tip_str = (f"{tip_master[:12]} ({tip_method:^8})" if tip_master
+ else "(unmatched) ")
+ print(f"{pr_ref} [{status:^6}] tip {head[:12]} -> {tip_str}
{head_subj}")
+ if args.verbose:
+ for pr_sha, master, method, subj in rows[1:]:
+ if master:
+ print(f" {pr_sha[:12]}
-> {master[:12]} ({method:^8}) {subj}")
+ else:
+ print(f" {pr_sha[:12]}
-> (unmatched) {subj}")
+
+ print("# summary: " + ", ".join(f"{k}={v}" for k, v in
sorted(counts.items())),
+ file=sys.stderr)
+
+
+if __name__ == "__main__":
+ main()
--
2.52.0
_______________________________________________
ffmpeg-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]