Re: [PATCH] expand_divmod: use VRP range to narrow constant multiplier [PR91883]

Jeffrey Law Wed, 03 Jun 2026 12:04:50 -0700

On 5/25/2026 3:38 AM, Nguyen Tran wrote:
> For divisions by a compile-time-constant divisor, expand_divmod calls
> choose_multiplier with the dividend's full type precision (size or
> size - 1).  When the dividend's value range is known to be narrower
> than the type, choose_multiplier could pick a smaller multiplier and
> post-shift, but expand_divmod has no path to receive that hint.
>
> This patch threads the VRP-derived value range from the GIMPLE side
> into expand_divmod and forwards it to choose_multiplier:
>
>    * expand_expr_divmod (gcc/expr.cc) computes a precision hint from
>      treeop0's range via a new helper determine_value_range, which
>      queries the active range_query during pass_expand.
>
>    * For unsigned (or non-negative-known signed) operands the hint is
>      wi::min_precision(max_val, UNSIGNED).  For signed operands with
>      possibly negative values it is wi::min_precision of the larger
>      of -min - 1 and max under unsigned interpretation -- the bound
>      choose_multiplier itself relies on.
>
>    * expand_divmod gets a new dividend_prec parameter (default -1
>      meaning "no hint") and an inline helper adjusted_dividend_prec
>      that hands the hint to choose_multiplier raised to LGUP, so the
>      lgup <= precision precondition is preserved.  The helper is
>      invoked at three call sites: unsigned TRUNC, signed TRUNC, and
>      signed FLOOR.
>
> For the example in PR target/91883,
>
>    unsigned g (unsigned a) {
>      if (a > 1000000) __builtin_unreachable ();
>      return a / 10;
>    }
>
> baseline -O2 emits
>
>    movl    %edi, %eax
>    movl    $3435973837, %edx
>    imulq   %rdx, %rax
>    shrq    $35, %rax
>
> with this patch:
>
>    movl    %edi, %eax
>    imulq   $429497139, %rax, %rax
>    shrq    $32, %rax
>
> The smaller multiplier fits imulq's 32-bit signed immediate, saving
> the separate movl, and the post-shift drops from 35 to 32.  Similar
> effects for /3, /7, /100.  For signed /7, codegen also shifts from
> the long path (addl-fixup) to the short path, reducing seven
> instructions to five.
>
> The new parameter defaults to -1 and produces baseline codegen for
> that case; functions without VRP range info are unaffected.
>
> gcc.target/i386/pr115910.c is updated.  The test has two functions
> both doing x / 3U: foo with __builtin_unreachable when x < 0, and
> bar with no hint.  Previously both produced identical codegen; the
> test asserted 2 imulq + 2 shrq $33.  With this patch foo's
> __builtin_unreachable narrows x to [0, INT_MAX] giving VRP-derived
> precision 31, so choose_multiplier picks a smaller M (1431655766,
> fits imulq's 32-bit immediate) and post-shift 32 instead of 33.
> bar is unchanged.  The dg-final assertions are split: still 2 imulq,
> plus 1 shrq $32 (foo) and 1 shrq $33 (bar).
>
> One drawback of this patch is some code bloat: the same
> divisor at two call sites with different ranges produces different
> assembly instructions.  Unmodified GCC produces the same assembly
> instructions regardless of range, allowing the linker's identical-
> code-folding (ICF) pass to merge the two function bodies into a
> single copy in the final binary.  With this patch the bodies differ,
> so ICF cannot fold them.  Clang has the same drawback: it also
> produces different assembly instructions for the same divisor at
> different ranges, which ICF cannot fold.
>
> For example, consider /3 at two different ranges:
>
>    unsigned int f3 (unsigned int a) {
>      if (a > 65535) __builtin_unreachable ();
>      return a / 3;
>    }
>    unsigned int f (unsigned int a) {
>      if (a > 655356) __builtin_unreachable ();
>      return a / 3;
>    }
>
> Unmodified GCC produces byte-identical assembly instructions for
> both functions, which ICF folds into a single copy:
>    f3:
>          mov     eax, edi
>          mov     edx, 2863311531
>          imul    rax, rdx
>          shr     rax, 33
>          ret
>    f:
>          mov     eax, edi
>          mov     edx, 2863311531
>          imul    rax, rdx
>          shr     rax, 33
>          ret
>
> Clang produces different assembly instructions; ICF cannot fold them:
>    f3:
>          imul    eax, edi, 43691
>          shr     eax, 17
>          ret
>    f:
>          mov     ecx, edi
>          mov     eax, 2863311531
>          imul    rax, rcx
>          shr     rax, 33
>          ret
>
> Patched GCC produces different assembly instructions; ICF cannot
> fold them:
>    f3:
>          mov     eax, edi
>          imul    rax, rax, 1431677610
>          shr     rax, 32
>          ret
>    f:
>          mov     eax, edi
>          imul    rax, rax, 1431657130
>          shr     rax, 32
>          ret
>
> A live demonstration: https://godbolt.org/z/75r6axnvG, compiled with
>
>    -O2 -ffunction-sections -fuse-ld=gold -Wl,--icf=all
>
> The flags:
>    -ffunction-sections   put each function into its own section, so
>                          the linker can compare and fold them
>                          individually
>    -fuse-ld=gold         use the gold linker (the default linker
>                          does not support ICF)
>    -Wl,--icf=all         enable ICF in the linker
>
> Under unmodified GCC, f3 and f end up at the same address.  Under
> clang and patched GCC, they end up at different addresses.
>
> Under default compiler/linker flags this patch likely has no meaningful
> downside: ICF is not enabled by default, so unmodified and patched
> both keep two copies of the body in .text.  We consider this an
> acceptable tradeoff: each individual division is at least as small
> as before, and clang accepts the same one.
>
> Bootstrapped on x86_64-pc-linux-gnu with the default language set
> (c, c++, fortran, objc, obj-c++, lto): "make bootstrap" followed by
> "make -k check" from the top of the build tree.  Stage-2 vs. stage-3
> binary comparison successful (the two compilers GCC produces of itself
> are byte-identical).  No new regressions in the testsuite, and all 30
> dg-final assertions across the seven new gcc.target/i386/divmod-range-*.c
> tests pass.
>
> gcc/ChangeLog:
>
>       PR target/91883
>       * expmed.cc (choose_multiplier): Prepend comment stating the
>       multiply-shift formula it computes.
>       (adjusted_dividend_prec): New helper.
>       (expand_divmod): Add dividend_prec parameter.  Forward to
>       adjusted_dividend_prec at the unsigned TRUNC, signed TRUNC,
>       and signed FLOOR XOR-sign-flip call sites.
>       * expmed.h (expand_divmod): Add dividend_prec default arg.
>       * expr.cc (determine_value_range): New helper, queries
>       range_query during pass_expand.
>       (expand_expr_divmod): Compute dividend_prec from treeop0's
>       range and forward to expand_divmod.
>
> gcc/testsuite/ChangeLog:
>
>       PR target/91883
>       * gcc.target/i386/pr115910.c: Adjust shrq assertions for
>       VRP-narrowed multiplier on foo.
>       * gcc.target/i386/divmod-range-1.c: New test: VRP-narrowed
>       multiplier for unsigned /10.
>       * gcc.target/i386/divmod-range-2.c: New test: VRP-narrowed
>       multiplier for unsigned %10.
>       * gcc.target/i386/divmod-range-3.c: New test: VRP-narrowed
>       multiplier for signed /10.
>       * gcc.target/i386/divmod-range-4.c: New test: VRP shifts
>       signed /7 from the long (addl-fixup) path to the short path.
>       * gcc.target/i386/divmod-range-5.c: New test: regression
>       guard -- no-hint unsigned divmod preserves baseline codegen.
>       * gcc.target/i386/divmod-range-6.c: New test: regression
>       guard -- no-hint signed /7 preserves long-path codegen.
>       * gcc.target/i386/divmod-range-7.c: New test: VRP-narrowed
>       multiplier for signed FLOOR_DIV (uses GIMPLE Front End since
>       FLOOR_DIV_EXPR cannot be produced from C source).
So first a non-technical issue.  I'm not aware of a DCO or copyright 
assignment on file for you.  So you'll need to either add a 
Signed-off-by tag to put the contribution under the DCO or execute a 
copyright assignment for this work to the FSF.


I probably wouldn't use the term "hint", but instead "range".  When I 
think of a hint I tend to think that that getting it wrong isn't 
catastrophic from a correctness standpoint.  But given how you're using 
the range information, if we get it wrong, then the result will be wrong.

You changed the signature for expand_divmod, then used a default value 
for that new parameter.  I'd generally prefer to avoid default values 
like that.  There aren't that many calls to expand_divmod, I'd prefer to 
just go fix them.

For a patch like this you will need to bootstrap the compiler and run a 
full regression test.  You may have done that already, if so just 
indicate what platform you did that step on.  I'd guess x86_64 linux, 
but best to be explicit about the testing.



> @@ -9775,6 +9806,35 @@ expand_expr_divmod (tree_code code, machine_mode mode, 
> tree treeop0,
>   {
>     bool mod_p = (code == TRUNC_MOD_EXPR || code == FLOOR_MOD_EXPR
>               || code == CEIL_MOD_EXPR || code == ROUND_MOD_EXPR);
> +
> +  /* Calculate precision hint from value range if available.  */
> +  int dividend_prec = -1;
> +
> +  if (SCALAR_INT_MODE_P (mode)
> +      && optimize >= 2
> +      && TREE_CODE (treeop1) == INTEGER_CST)
> +    {
> +      wide_int min_val, max_val;
> +      if (determine_value_range (treeop0, &min_val, &max_val))
> +     {
> +       if (unsignedp || wi::ges_p (min_val, 0))
> +         {
> +           /* Unsigned or known non-negative: precision from upper bound.  */
> +           dividend_prec = wi::min_precision (max_val, UNSIGNED);
> +         }
> +       else
> +         {
> +           /* Signed with possible negative values.  Take the unsigned
> +              precision of the larger of -min - 1 and max (or just
> +              -min - 1 when max < 0, since -min - 1 dominates then).  */
> +           wide_int neg_side = -min_val - 1;
Is there any chance of overflow here?  -MIN_INT is usually problematical 
and subtracting 1 from that result just makes overflow more worrisome.

In general it looks pretty reasonable.  Fix the stuff above and I'll get 
it committed if nobody objects.

Thanks!

jeff

Re: [PATCH] expand_divmod: use VRP range to narrow constant multiplier [PR91883]

Reply via email to