Pinging again for a review on this series.

In our measurements, this improces Sealcrypto in SPEC CPU 2026 on
Neoverse-N1 by 25% and by > 50% on Zen4.
LLVM landed similar optimisations earlier this year and this restores
comparable performance.

Thank you,
Philipp.

On Thu, 26 Mar 2026 at 12:11, Konstantinos Eleftheriou <
[email protected]> wrote:

>
> This patch series teaches GCC to recognize longhand 64x64->128
> wide-multiplication idioms and replace them with native multiply
> instructions (MULT_HIGHPART_EXPR / widening multiply for the high part,
> plain MULT_EXPR for the low part).
>
> Portable C/C++ code that needs a 128-bit product on a 64-bit target
> often resorts to a longhand decomposition: split operands into 32-bit
> halves, compute four partial products, and propagate carries manually.
> This pattern appears in a number of real-world codebases, including
> SPEC2026's 750.sealcrypto_r (seal/util/uintarith.h) and several
> examples from Hacker's Delight.  Targets like AArch64 (mul/umulh) and
> x86-64 can compute the full 128-bit product in one or two instructions,
> but GCC does not currently fold the longhand sequence back to these.
>
> The series is split into two patches:
>
>   1/2  match.pd: Flatten carry-diamond patterns to straight-line code
>
>        A carry diamond implements unsigned overflow detection with
>        conditional carry propagation:
>
>          sum = a + b;
>          if (addend > sum) result = base + C;
>
>        This is implemented as a match.pd simplification applied during
>        phiopt, which converts such conditional branches into branchless
>        straight-line code:
>
>          result = base + ((type)(addend > sum) << log2(C));
>
>        (PHI<pow2, 0>) is already handled by the existing match.pd pattern
>        for conditional power-of-two.
>
>        This is a prerequisite for the long-multiply pattern matching,
>        which expects carries in the form
>        (lshift (convert? (gt ...)) INTEGER_CST).
>
>   2/2  forwprop: Match and fold long-multiply patterns [PR107090]
>
>        Adds match.pd patterns that recognize six decomposed variants
>        of the longhand multiplication:
>
>          - carry:       single overflow comparison on the cross-sum
>          - carry-long:  cross-carry with separate high/low accumulation
>          - two-carry:   both cross-carry and low-carry as separate
>                         comparisons
>          - ladder:      sequential accumulation without explicit carry
>                         comparison
>          - ladder-long: ladder with separate high/low accumulation
>          - low-plus:    low part as a direct sum of partial products
>
>        The actual folding is performed in forwprop after verifying
>        target support for umul_highpart or widening multiply.
>
> On SPEC2026's 750.sealcrypto_r:
>
>   - AArch64 Neoverse-N1: 25% improvement
>   - x86-64 Zen4:         59% improvement
>
> Bootstrapped/regtested on AArch64 and x86-64.
>
> Changes in v3:
> - Moved carry-diamond flattening from forwprop to match.pd,
> replacing ~460 lines of C++ with a 17-line match.pd pattern.
> - Two-carry test scans forwprop3 (the first forwprop after phiopt2,
> since early phiopt restricts which tree codes are allowed).
> - Set location for new sequences.
> - Updated mul_carry_low pattern.
> - Added the `mul_low_plus` pattern.
> - Fixed formatting issues.
>
> Changes in v2:
> - Fixed the testcases by separating the high part's fold count for
> 32-bit and 64-bit targets.
>
> Konstantinos Eleftheriou (2):
>   match.pd: Flatten carry-diamond patterns to straight-line code
>   forwprop: Match and fold long-multiply patterns [PR107090]
>
>  gcc/match.pd                                  | 250 +++++
>  gcc/testsuite/gcc.dg/tree-ssa/forwprop-44.c   |  21 +
>  gcc/testsuite/gcc.dg/tree-ssa/forwprop-45.c   |  44 +
>  .../gcc.dg/tree-ssa/long-mul-boundary-64.c    | 274 ++++++
>  .../gcc.dg/tree-ssa/long-mul-boundary.c       | 270 ++++++
>  .../gcc.dg/tree-ssa/long-mul-carry.c          | 311 +++++++
>  .../gcc.dg/tree-ssa/long-mul-ladder.c         | 329 +++++++
>  .../gcc.dg/tree-ssa/long-mul-low-plus.c       |  54 ++
>  .../gcc.dg/tree-ssa/long-mul-partial.c        | 119 +++
>  .../gcc.dg/tree-ssa/long-mul-two-carry.c      | 111 +++
>  gcc/testsuite/gcc.target/aarch64/long_mul.c   | 100 ++
>  gcc/testsuite/gcc.target/i386/long_mul.c      | 100 ++
>  gcc/tree-ssa-forwprop.cc                      | 871 +++++++++++++++++-
>  13 files changed, 2849 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-44.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-45.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/long-mul-boundary-64.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/long-mul-boundary.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/long-mul-carry.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/long-mul-ladder.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/long-mul-low-plus.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/long-mul-partial.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/long-mul-two-carry.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/long_mul.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/long_mul.c
>
> --
> 2.52.0
>
> base-commit: 0661c5480c80bc40d9bc1cb15c3264d67c2efe9c
> branch: kelefth/gcc-423-v3
>

Reply via email to