PR #23410 opened by michaelni
URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/23410
Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/23410.patch

Add a helper that maps forge pull request head refs (refs/remotes/fforge/pr/*)
to the commits they correspond to on a base branch (origin/master).

Because pull requests are integrated by rebasing/applying their patches,
the resulting hashes differ from the PR branch, so matching is done by
git patch-id with an exact subject line and plain SHA as fallbacks. The
merge status is derived from the PR tip commit.

Co-Authored-by: AI

# Output sample
```
fforge/pr/23257  [merged]  tip d39218ba40c8 -> 1152139b4898 (patch-id)  
avcodec/cook: bound subpacket channel sum against channel count
fforge/pr/23258  [ open ]  tip 188a40ecceb5 -> (unmatched)              
avfilter/dnn: implement batching and dynamic shapes for Torch backend
                               1624f5aba6d5 -> (unmatched)              
avfilter/dnn: implement persistent input buffer for torch backend
fforge/pr/23259  [ open ]  tip 61725754647b -> (unmatched)              rtsp: 
send RTCP RR to unicast source by default
                               69b7127e2af9 -> (unmatched)              rtsp: 
mark URL as writable to allow RTCP RRs
                               c7dc586fdfbe -> (unmatched)              Revert 
"avformat/udp: modify the not write-only to read-only mode."
fforge/pr/23260  [ open ]  tip 59742c632e3c -> (unmatched)              
avformat/dashdec: fix unsigned integer overflow in segment number calculation
fforge/pr/23261  [merged]  tip bb5c461a4732 -> bb5c461a4732 (patch-id)  
avfilter/vf_libplacebo: setup pl_vulkan_queue.flags on import params
fforge/pr/23262  [ open ]  tip 470f1accc04d -> (unmatched)              
avformat/hlsenc: add EXT-X-DISCONTINUITY-SEQUENCE support
fforge/pr/23263  [ open ]  tip 6ca42bf4998a -> (unmatched)              Media 
over QUIC (MoQ) support for FFmpeg using fragmented mp4
fforge/pr/23264  [merged]  tip 95fe0658d7b3 -> 95fe0658d7b3 (patch-id)  
avformat/mov: don't abort on unsupported or invalid chnl boxes
fforge/pr/23265  [merged]  tip a56af32a0bc5 -> 7a2424eb43e6 (patch-id)  
avcodec/apv_decode: avoid using apv_cbc
fforge/pr/23266  [merged]  tip 1891772309c0 -> 6d8f7882ae6e (patch-id)  
avcodec/adpcm: require block_align to be a multiple of channels in ADPCM_PSXC 
init
fforge/pr/23267  [merged]  tip a371c1b37ca2 -> de261b9bb2c6 (patch-id)  
tests/checkasm/crc: use libavutil memory allocation helpers
                               686ef7eb3d77 -> 224659360a8b (patch-id)  
tests/checkasm/crc: retain offset values between calls
fforge/pr/23268  [merged]  tip 4d63e3dd4c64 -> 4d63e3dd4c64 (patch-id)  
vulkan_ffv1: add Bayer encoder
fforge/pr/23269  [merged]  tip f13b3720219f -> aaac0989e6b4 (patch-id)  
avformat/mxfdec: Remove unneeded check
fforge/pr/23271  [merged]  tip bc9e8dddc603 -> c7e0bac050a6 (patch-id)  
avformat/matroskadec: bound TRACKENTRY parsing by max_streams
fforge/pr/23272  [merged]  tip cf812f8bb316 -> d9e2239f3c5e (subject )  
swscale/aarch64/yuv2rgb_neon: 2 lines at a time, yuva420p
                               64b7fc64aabe -> 4b6f7c2a05f4 (patch-id)  
swscale/aarch64/yuv2rgb_neon: 2 lines at a time, rgb16
                               2aa06225f4a8 -> dad212060c77 (patch-id)  
swscale/aarch64/yuv2rgb_neon: 2 lines at a time, gbrp
                               d829b3aa7765 -> 4bfe7efd0c3f (subject )  
swscale/aarch64/yuv2rgb_neon: 2 lines at a time, packed RGB
                               dc9b46d4ebc9 -> 11b1721b11be (subject )  
swscale/aarch64/yuv2rgb_neon: reorder params, unify signature
                               d6c5bb70f7ee -> 8dbc7299509f (patch-id)  
swscale/aarch64/yuv2rgb_neon: name registers
                               82bfa79e0d8f -> e0fa6412408f (patch-id)  
swscale/aarch64/yuv2rgb_neon: chroma-preserve compute_rgb
fforge/pr/23273  [ open ]  tip 55942edecbf7 -> (unmatched)              
avcodec/vvc/ps: reject mismatched slice entry point count

```


>From 0017a6fe6c68f70a1bbe5fcf6475e485ffba3722 Mon Sep 17 00:00:00 2001
From: Michael Niedermayer <[email protected]>
Date: Mon, 8 Jun 2026 15:00:10 +0200
Subject: [PATCH] tools: add match_prs_to_master.py

Add a helper that maps forge pull request head refs (refs/remotes/fforge/pr/*)
to the commits they correspond to on a base branch (origin/master).

Because pull requests are integrated by rebasing/applying their patches,
the resulting hashes differ from the PR branch, so matching is done by
git patch-id with an exact subject line and plain SHA as fallbacks. The
merge status is derived from the PR tip commit.

Co-Authored-by: AI
---
 tools/match_prs_to_master.py | 227 +++++++++++++++++++++++++++++++++++
 1 file changed, 227 insertions(+)
 create mode 100755 tools/match_prs_to_master.py

diff --git a/tools/match_prs_to_master.py b/tools/match_prs_to_master.py
new file mode 100755
index 0000000000..cc9ddf68e7
--- /dev/null
+++ b/tools/match_prs_to_master.py
@@ -0,0 +1,227 @@
+#!/usr/bin/env python3
+"""Match commits in a base branch (origin/master) to forge PR head refs.
+
+FFmpeg integrates pull requests by rebasing/applying the patches, so the
+commit hashes on master differ from the ones on the PR branch (a
+Signed-off-by line is usually appended too). This means a plain SHA
+comparison finds almost nothing. Instead we match by:
+
+  1. git patch-id (the hash of the diff, stable across rebase) -- primary,
+  2. the exact commit subject (first line) -- fallback, used only when it
+     maps to a single master commit, so short/ambiguous subjects do not
+     create false positives,
+  3. plain SHA / ancestry, for PRs merged without a rebase.
+
+For every PR ref it prints the PR tip commit and the master commit it maps
+to (if any), plus a status. The status is decided by the PR *tip* commit,
+because a merged PR's final patch lands on master while the intermediate
+commits a PR head inherits from a stale fork base are not a reliable
+signal:
+  merged   - the PR tip was found on master
+  partial  - the tip was not found, but (with --verbose) some commit was
+  open     - nothing was found on master
+
+With --verbose every PR commit (capped by --max-commits) is matched and
+listed too, which is useful for spotting partially merged PRs.
+
+Note: this matches by content, not forge metadata, so it cannot tell an
+"open" PR apart from one that was closed without merging.
+
+Usage:
+  ./match_prs_to_master.py [--base origin/master]
+                           [--pr-glob 'refs/remotes/fforge/pr/*']
+                           [--since '2023-01-01']   # master window start
+                           [--limit N]              # only first N PRs (debug)
+                           [--verbose] [--max-commits N]
+                           [--format tsv|report]
+"""
+
+import argparse
+import subprocess
+import sys
+from collections import defaultdict
+
+
+def git(*args, check=True):
+    return subprocess.run(["git", *args], capture_output=True, text=True,
+                          check=check).stdout
+
+
+def git_lines(*args):
+    out = git(*args)
+    return out.splitlines() if out else []
+
+
+def build_master_index(base, since):
+    """Return (patchid -> sha, subject -> [sha], set_of_shas) for base."""
+    rng_args = ["--no-merges"]
+    if since:
+        rng_args += ["--since", since]
+
+    # subjects + shas
+    subj_of = {}
+    shas = []
+    for line in git_lines("log", *rng_args, "--format=%H\x1f%s", base):
+        sha, _, subj = line.partition("\x1f")
+        subj_of[sha] = subj
+        shas.append(sha)
+    sha_set = set(shas)
+
+    subject_to_shas = defaultdict(list)
+    for sha, subj in subj_of.items():
+        subject_to_shas[subj].append(sha)
+
+    # patch-ids: `git log -p | git patch-id` yields "<patchid> <commitsha>"
+    patchid_to_sha = {}
+    p1 = subprocess.Popen(["git", "log", *rng_args, "-p", base],
+                          stdout=subprocess.PIPE)
+    p2 = subprocess.Popen(["git", "patch-id", "--stable"],
+                          stdin=p1.stdout, stdout=subprocess.PIPE, text=True)
+    p1.stdout.close()
+    for line in p2.stdout:
+        parts = line.split()
+        if len(parts) == 2:
+            patchid, sha = parts
+            # keep the first (newest) occurrence
+            patchid_to_sha.setdefault(patchid, sha)
+    p2.wait()
+    p1.wait()
+
+    return patchid_to_sha, subject_to_shas, subj_of, sha_set
+
+
+def pr_commits(pr_ref, base, max_commits):
+    """Commits reachable from the PR head but not from base (newest first).
+
+    Capped: PR heads often live on stale forks that carry many unrelated
+    commits, so we only look at the most recent max_commits of them.
+    """
+    args = ["rev-list", "--no-merges"]
+    if max_commits:
+        args += ["-n", str(max_commits)]
+    return git_lines(*args, pr_ref, "^" + base)
+
+
+def patch_id_of(rev):
+    """patch-id of a single commit, or None."""
+    p1 = subprocess.Popen(["git", "show", rev], stdout=subprocess.PIPE)
+    p2 = subprocess.Popen(["git", "patch-id", "--stable"],
+                          stdin=p1.stdout, stdout=subprocess.PIPE, text=True)
+    p1.stdout.close()
+    out = p2.communicate()[0]
+    p1.wait()
+    parts = out.split()
+    return parts[0] if len(parts) >= 1 else None
+
+
+def match_commit(sha, subj, patchid_to_sha, subject_to_shas, sha_set):
+    """Return (master_sha, method) or (None, None)."""
+    pid = patch_id_of(sha)
+    if pid and pid in patchid_to_sha:
+        return patchid_to_sha[pid], "patch-id"
+    if sha in sha_set:
+        return sha, "sha"
+    cand = subject_to_shas.get(subj, [])
+    if len(cand) == 1:
+        return cand[0], "subject"
+    return None, None
+
+
+def auto_since(pr_glob, margin_days=30):
+    """Earliest PR head commit date minus a margin, to bound master 
indexing."""
+    dates = git_lines("for-each-ref", "--format=%(committerdate:unix)", 
pr_glob)
+    dates = [int(d) for d in dates if d.strip()]
+    if not dates:
+        return None
+    return f"@{min(dates) - margin_days * 86400}"
+
+
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--base", default="origin/master")
+    ap.add_argument("--pr-glob", default="refs/remotes/fforge/pr/*")
+    ap.add_argument("--since", default=None,
+                    help="master window start (default: earliest PR date - 
30d)")
+    ap.add_argument("--limit", type=int, default=0)
+    ap.add_argument("--format", choices=["tsv", "report"], default="report")
+    ap.add_argument("--verbose", action="store_true",
+                    help="match every PR commit, not just the tip")
+    ap.add_argument("--max-commits", type=int, default=50,
+                    help="cap on PR commits inspected in --verbose mode")
+    args = ap.parse_args()
+
+    since = args.since or auto_since(args.pr_glob)
+    print(f"# indexing {args.base} since {since} ...", file=sys.stderr)
+    patchid_to_sha, subject_to_shas, subj_of, sha_set = \
+        build_master_index(args.base, since)
+    print(f"# indexed {len(sha_set)} master commits, "
+          f"{len(patchid_to_sha)} patch-ids", file=sys.stderr)
+
+    pr_refs = git_lines("for-each-ref", "--format=%(refname:short)", 
args.pr_glob)
+    # sort numerically by trailing PR number when possible
+    def prnum(r):
+        tail = r.rsplit("/", 1)[-1]
+        return int(tail) if tail.isdigit() else -1
+    pr_refs.sort(key=prnum)
+    if args.limit:
+        pr_refs = pr_refs[:args.limit]
+
+    if args.format == "tsv":
+        print("pr_ref\tstatus\ttip_commit\tmaster_commit\tmethod\tsubject")
+
+    def subject_of(sha):
+        return subj_of.get(sha) or git("show", "-s", "--format=%s", 
sha).strip()
+
+    counts = defaultdict(int)
+    for pr_ref in pr_refs:
+        head = git("rev-parse", pr_ref).strip()
+        head_subj = subject_of(head)
+
+        # Status is decided by the PR tip: when a PR is merged, its last
+        # patch lands on master. Intermediate commits from a stale fork base
+        # are not a reliable signal.
+        tip_master, tip_method = match_commit(
+            head, head_subj, patchid_to_sha, subject_to_shas, sha_set)
+
+        rows = [(head, tip_master, tip_method, head_subj)]  # tip first
+
+        if args.verbose:
+            for sha in pr_commits(pr_ref, args.base, args.max_commits):
+                if sha == head:
+                    continue
+                subj = subject_of(sha)
+                m, meth = match_commit(sha, subj, patchid_to_sha,
+                                       subject_to_shas, sha_set)
+                rows.append((sha, m, meth, subj))
+
+        matched = sum(1 for r in rows if r[1])
+        if tip_master:
+            status = "merged"
+        elif matched:
+            status = "part"
+        else:
+            status = "open"
+        counts[status] += 1
+
+        if args.format == "tsv":
+            for pr_sha, master, method, subj in rows:
+                print(f"{pr_ref}\t{status}\t{pr_sha[:12]}\t"
+                      f"{(master[:12] if master else '-')}\t"
+                      f"{method or '-'}\t{subj}")
+        else:
+            tip_str = (f"{tip_master[:12]} ({tip_method:^8})" if tip_master
+                       else "(unmatched)            ")
+            print(f"{pr_ref}  [{status:^6}]  tip {head[:12]} -> {tip_str}  
{head_subj}")
+            if args.verbose:
+                for pr_sha, master, method, subj in rows[1:]:
+                    if master:
+                        print(f"                               {pr_sha[:12]} 
-> {master[:12]} ({method:^8})  {subj}")
+                    else:
+                        print(f"                               {pr_sha[:12]} 
-> (unmatched)              {subj}")
+
+    print("# summary: " + ", ".join(f"{k}={v}" for k, v in 
sorted(counts.items())),
+          file=sys.stderr)
+
+
+if __name__ == "__main__":
+    main()
-- 
2.52.0

_______________________________________________
ffmpeg-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to