Pinging again for a review on this series. In our measurements, this improces Sealcrypto in SPEC CPU 2026 on Neoverse-N1 by 25% and by > 50% on Zen4. LLVM landed similar optimisations earlier this year and this restores comparable performance.
Thank you, Philipp. On Thu, 26 Mar 2026 at 12:11, Konstantinos Eleftheriou < [email protected]> wrote: > > This patch series teaches GCC to recognize longhand 64x64->128 > wide-multiplication idioms and replace them with native multiply > instructions (MULT_HIGHPART_EXPR / widening multiply for the high part, > plain MULT_EXPR for the low part). > > Portable C/C++ code that needs a 128-bit product on a 64-bit target > often resorts to a longhand decomposition: split operands into 32-bit > halves, compute four partial products, and propagate carries manually. > This pattern appears in a number of real-world codebases, including > SPEC2026's 750.sealcrypto_r (seal/util/uintarith.h) and several > examples from Hacker's Delight. Targets like AArch64 (mul/umulh) and > x86-64 can compute the full 128-bit product in one or two instructions, > but GCC does not currently fold the longhand sequence back to these. > > The series is split into two patches: > > 1/2 match.pd: Flatten carry-diamond patterns to straight-line code > > A carry diamond implements unsigned overflow detection with > conditional carry propagation: > > sum = a + b; > if (addend > sum) result = base + C; > > This is implemented as a match.pd simplification applied during > phiopt, which converts such conditional branches into branchless > straight-line code: > > result = base + ((type)(addend > sum) << log2(C)); > > (PHI<pow2, 0>) is already handled by the existing match.pd pattern > for conditional power-of-two. > > This is a prerequisite for the long-multiply pattern matching, > which expects carries in the form > (lshift (convert? (gt ...)) INTEGER_CST). > > 2/2 forwprop: Match and fold long-multiply patterns [PR107090] > > Adds match.pd patterns that recognize six decomposed variants > of the longhand multiplication: > > - carry: single overflow comparison on the cross-sum > - carry-long: cross-carry with separate high/low accumulation > - two-carry: both cross-carry and low-carry as separate > comparisons > - ladder: sequential accumulation without explicit carry > comparison > - ladder-long: ladder with separate high/low accumulation > - low-plus: low part as a direct sum of partial products > > The actual folding is performed in forwprop after verifying > target support for umul_highpart or widening multiply. > > On SPEC2026's 750.sealcrypto_r: > > - AArch64 Neoverse-N1: 25% improvement > - x86-64 Zen4: 59% improvement > > Bootstrapped/regtested on AArch64 and x86-64. > > Changes in v3: > - Moved carry-diamond flattening from forwprop to match.pd, > replacing ~460 lines of C++ with a 17-line match.pd pattern. > - Two-carry test scans forwprop3 (the first forwprop after phiopt2, > since early phiopt restricts which tree codes are allowed). > - Set location for new sequences. > - Updated mul_carry_low pattern. > - Added the `mul_low_plus` pattern. > - Fixed formatting issues. > > Changes in v2: > - Fixed the testcases by separating the high part's fold count for > 32-bit and 64-bit targets. > > Konstantinos Eleftheriou (2): > match.pd: Flatten carry-diamond patterns to straight-line code > forwprop: Match and fold long-multiply patterns [PR107090] > > gcc/match.pd | 250 +++++ > gcc/testsuite/gcc.dg/tree-ssa/forwprop-44.c | 21 + > gcc/testsuite/gcc.dg/tree-ssa/forwprop-45.c | 44 + > .../gcc.dg/tree-ssa/long-mul-boundary-64.c | 274 ++++++ > .../gcc.dg/tree-ssa/long-mul-boundary.c | 270 ++++++ > .../gcc.dg/tree-ssa/long-mul-carry.c | 311 +++++++ > .../gcc.dg/tree-ssa/long-mul-ladder.c | 329 +++++++ > .../gcc.dg/tree-ssa/long-mul-low-plus.c | 54 ++ > .../gcc.dg/tree-ssa/long-mul-partial.c | 119 +++ > .../gcc.dg/tree-ssa/long-mul-two-carry.c | 111 +++ > gcc/testsuite/gcc.target/aarch64/long_mul.c | 100 ++ > gcc/testsuite/gcc.target/i386/long_mul.c | 100 ++ > gcc/tree-ssa-forwprop.cc | 871 +++++++++++++++++- > 13 files changed, 2849 insertions(+), 5 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-44.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-45.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/long-mul-boundary-64.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/long-mul-boundary.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/long-mul-carry.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/long-mul-ladder.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/long-mul-low-plus.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/long-mul-partial.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/long-mul-two-carry.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/long_mul.c > create mode 100644 gcc/testsuite/gcc.target/i386/long_mul.c > > -- > 2.52.0 > > base-commit: 0661c5480c80bc40d9bc1cb15c3264d67c2efe9c > branch: kelefth/gcc-423-v3 >
