On 5/25/2026 3:38 AM, Nguyen Tran wrote:
> For divisions by a compile-time-constant divisor, expand_divmod calls
> choose_multiplier with the dividend's full type precision (size or
> size - 1).  When the dividend's value range is known to be narrower
> than the type, choose_multiplier could pick a smaller multiplier and
> post-shift, but expand_divmod has no path to receive that hint.
>
> This patch threads the VRP-derived value range from the GIMPLE side
> into expand_divmod and forwards it to choose_multiplier:
>
>    * expand_expr_divmod (gcc/expr.cc) computes a precision hint from
>      treeop0's range via a new helper determine_value_range, which
>      queries the active range_query during pass_expand.
>
>    * For unsigned (or non-negative-known signed) operands the hint is
>      wi::min_precision(max_val, UNSIGNED).  For signed operands with
>      possibly negative values it is wi::min_precision of the larger
>      of -min - 1 and max under unsigned interpretation -- the bound
>      choose_multiplier itself relies on.
>
>    * expand_divmod gets a new dividend_prec parameter (default -1
>      meaning "no hint") and an inline helper adjusted_dividend_prec
>      that hands the hint to choose_multiplier raised to LGUP, so the
>      lgup <= precision precondition is preserved.  The helper is
>      invoked at three call sites: unsigned TRUNC, signed TRUNC, and
>      signed FLOOR.
>
> For the example in PR target/91883,
>
>    unsigned g (unsigned a) {
>      if (a > 1000000) __builtin_unreachable ();
>      return a / 10;
>    }
>
> baseline -O2 emits
>
>    movl    %edi, %eax
>    movl    $3435973837, %edx
>    imulq   %rdx, %rax
>    shrq    $35, %rax
>
> with this patch:
>
>    movl    %edi, %eax
>    imulq   $429497139, %rax, %rax
>    shrq    $32, %rax
>
> The smaller multiplier fits imulq's 32-bit signed immediate, saving
> the separate movl, and the post-shift drops from 35 to 32.  Similar
> effects for /3, /7, /100.  For signed /7, codegen also shifts from
> the long path (addl-fixup) to the short path, reducing seven
> instructions to five.
>
> The new parameter defaults to -1 and produces baseline codegen for
> that case; functions without VRP range info are unaffected.
>
> gcc.target/i386/pr115910.c is updated.  The test has two functions
> both doing x / 3U: foo with __builtin_unreachable when x < 0, and
> bar with no hint.  Previously both produced identical codegen; the
> test asserted 2 imulq + 2 shrq $33.  With this patch foo's
> __builtin_unreachable narrows x to [0, INT_MAX] giving VRP-derived
> precision 31, so choose_multiplier picks a smaller M (1431655766,
> fits imulq's 32-bit immediate) and post-shift 32 instead of 33.
> bar is unchanged.  The dg-final assertions are split: still 2 imulq,
> plus 1 shrq $32 (foo) and 1 shrq $33 (bar).
>
> One drawback of this patch is some code bloat: the same
> divisor at two call sites with different ranges produces different
> assembly instructions.  Unmodified GCC produces the same assembly
> instructions regardless of range, allowing the linker's identical-
> code-folding (ICF) pass to merge the two function bodies into a
> single copy in the final binary.  With this patch the bodies differ,
> so ICF cannot fold them.  Clang has the same drawback: it also
> produces different assembly instructions for the same divisor at
> different ranges, which ICF cannot fold.
>
> For example, consider /3 at two different ranges:
>
>    unsigned int f3 (unsigned int a) {
>      if (a > 65535) __builtin_unreachable ();
>      return a / 3;
>    }
>    unsigned int f (unsigned int a) {
>      if (a > 655356) __builtin_unreachable ();
>      return a / 3;
>    }
>
> Unmodified GCC produces byte-identical assembly instructions for
> both functions, which ICF folds into a single copy:
>    f3:
>          mov     eax, edi
>          mov     edx, 2863311531
>          imul    rax, rdx
>          shr     rax, 33
>          ret
>    f:
>          mov     eax, edi
>          mov     edx, 2863311531
>          imul    rax, rdx
>          shr     rax, 33
>          ret
>
> Clang produces different assembly instructions; ICF cannot fold them:
>    f3:
>          imul    eax, edi, 43691
>          shr     eax, 17
>          ret
>    f:
>          mov     ecx, edi
>          mov     eax, 2863311531
>          imul    rax, rcx
>          shr     rax, 33
>          ret
>
> Patched GCC produces different assembly instructions; ICF cannot
> fold them:
>    f3:
>          mov     eax, edi
>          imul    rax, rax, 1431677610
>          shr     rax, 32
>          ret
>    f:
>          mov     eax, edi
>          imul    rax, rax, 1431657130
>          shr     rax, 32
>          ret
>
> A live demonstration: https://godbolt.org/z/75r6axnvG, compiled with
>
>    -O2 -ffunction-sections -fuse-ld=gold -Wl,--icf=all
>
> The flags:
>    -ffunction-sections   put each function into its own section, so
>                          the linker can compare and fold them
>                          individually
>    -fuse-ld=gold         use the gold linker (the default linker
>                          does not support ICF)
>    -Wl,--icf=all         enable ICF in the linker
>
> Under unmodified GCC, f3 and f end up at the same address.  Under
> clang and patched GCC, they end up at different addresses.
>
> Under default compiler/linker flags this patch likely has no meaningful
> downside: ICF is not enabled by default, so unmodified and patched
> both keep two copies of the body in .text.  We consider this an
> acceptable tradeoff: each individual division is at least as small
> as before, and clang accepts the same one.
>
> Bootstrapped on x86_64-pc-linux-gnu with the default language set
> (c, c++, fortran, objc, obj-c++, lto): "make bootstrap" followed by
> "make -k check" from the top of the build tree.  Stage-2 vs. stage-3
> binary comparison successful (the two compilers GCC produces of itself
> are byte-identical).  No new regressions in the testsuite, and all 30
> dg-final assertions across the seven new gcc.target/i386/divmod-range-*.c
> tests pass.
>
> gcc/ChangeLog:
>
>       PR target/91883
>       * expmed.cc (choose_multiplier): Prepend comment stating the
>       multiply-shift formula it computes.
>       (adjusted_dividend_prec): New helper.
>       (expand_divmod): Add dividend_prec parameter.  Forward to
>       adjusted_dividend_prec at the unsigned TRUNC, signed TRUNC,
>       and signed FLOOR XOR-sign-flip call sites.
>       * expmed.h (expand_divmod): Add dividend_prec default arg.
>       * expr.cc (determine_value_range): New helper, queries
>       range_query during pass_expand.
>       (expand_expr_divmod): Compute dividend_prec from treeop0's
>       range and forward to expand_divmod.
>
> gcc/testsuite/ChangeLog:
>
>       PR target/91883
>       * gcc.target/i386/pr115910.c: Adjust shrq assertions for
>       VRP-narrowed multiplier on foo.
>       * gcc.target/i386/divmod-range-1.c: New test: VRP-narrowed
>       multiplier for unsigned /10.
>       * gcc.target/i386/divmod-range-2.c: New test: VRP-narrowed
>       multiplier for unsigned %10.
>       * gcc.target/i386/divmod-range-3.c: New test: VRP-narrowed
>       multiplier for signed /10.
>       * gcc.target/i386/divmod-range-4.c: New test: VRP shifts
>       signed /7 from the long (addl-fixup) path to the short path.
>       * gcc.target/i386/divmod-range-5.c: New test: regression
>       guard -- no-hint unsigned divmod preserves baseline codegen.
>       * gcc.target/i386/divmod-range-6.c: New test: regression
>       guard -- no-hint signed /7 preserves long-path codegen.
>       * gcc.target/i386/divmod-range-7.c: New test: VRP-narrowed
>       multiplier for signed FLOOR_DIV (uses GIMPLE Front End since
>       FLOOR_DIV_EXPR cannot be produced from C source).
So first a non-technical issue.  I'm not aware of a DCO or copyright 
assignment on file for you.  So you'll need to either add a 
Signed-off-by tag to put the contribution under the DCO or execute a 
copyright assignment for this work to the FSF.

I probably wouldn't use the term "hint", but instead "range".  When I 
think of a hint I tend to think that that getting it wrong isn't 
catastrophic from a correctness standpoint.  But given how you're using 
the range information, if we get it wrong, then the result will be wrong.

You changed the signature for expand_divmod, then used a default value 
for that new parameter.  I'd generally prefer to avoid default values 
like that.  There aren't that many calls to expand_divmod, I'd prefer to 
just go fix them.

For a patch like this you will need to bootstrap the compiler and run a 
full regression test.  You may have done that already, if so just 
indicate what platform you did that step on.  I'd guess x86_64 linux, 
but best to be explicit about the testing.



> @@ -9775,6 +9806,35 @@ expand_expr_divmod (tree_code code, machine_mode mode, 
> tree treeop0,
>   {
>     bool mod_p = (code == TRUNC_MOD_EXPR || code == FLOOR_MOD_EXPR
>               || code == CEIL_MOD_EXPR || code == ROUND_MOD_EXPR);
> +
> +  /* Calculate precision hint from value range if available.  */
> +  int dividend_prec = -1;
> +
> +  if (SCALAR_INT_MODE_P (mode)
> +      && optimize >= 2
> +      && TREE_CODE (treeop1) == INTEGER_CST)
> +    {
> +      wide_int min_val, max_val;
> +      if (determine_value_range (treeop0, &min_val, &max_val))
> +     {
> +       if (unsignedp || wi::ges_p (min_val, 0))
> +         {
> +           /* Unsigned or known non-negative: precision from upper bound.  */
> +           dividend_prec = wi::min_precision (max_val, UNSIGNED);
> +         }
> +       else
> +         {
> +           /* Signed with possible negative values.  Take the unsigned
> +              precision of the larger of -min - 1 and max (or just
> +              -min - 1 when max < 0, since -min - 1 dominates then).  */
> +           wide_int neg_side = -min_val - 1;
Is there any chance of overflow here?  -MIN_INT is usually problematical 
and subtracting 1 from that result just makes overflow more worrisome.

In general it looks pretty reasonable.  Fix the stuff above and I'll get 
it committed if nobody objects.

Thanks!

jeff

Reply via email to