On 5/25/2026 3:38 AM, Nguyen Tran wrote:
> For divisions by a compile-time-constant divisor, expand_divmod calls
> choose_multiplier with the dividend's full type precision (size or
> size - 1). When the dividend's value range is known to be narrower
> than the type, choose_multiplier could pick a smaller multiplier and
> post-shift, but expand_divmod has no path to receive that hint.
>
> This patch threads the VRP-derived value range from the GIMPLE side
> into expand_divmod and forwards it to choose_multiplier:
>
> * expand_expr_divmod (gcc/expr.cc) computes a precision hint from
> treeop0's range via a new helper determine_value_range, which
> queries the active range_query during pass_expand.
>
> * For unsigned (or non-negative-known signed) operands the hint is
> wi::min_precision(max_val, UNSIGNED). For signed operands with
> possibly negative values it is wi::min_precision of the larger
> of -min - 1 and max under unsigned interpretation -- the bound
> choose_multiplier itself relies on.
>
> * expand_divmod gets a new dividend_prec parameter (default -1
> meaning "no hint") and an inline helper adjusted_dividend_prec
> that hands the hint to choose_multiplier raised to LGUP, so the
> lgup <= precision precondition is preserved. The helper is
> invoked at three call sites: unsigned TRUNC, signed TRUNC, and
> signed FLOOR.
>
> For the example in PR target/91883,
>
> unsigned g (unsigned a) {
> if (a > 1000000) __builtin_unreachable ();
> return a / 10;
> }
>
> baseline -O2 emits
>
> movl %edi, %eax
> movl $3435973837, %edx
> imulq %rdx, %rax
> shrq $35, %rax
>
> with this patch:
>
> movl %edi, %eax
> imulq $429497139, %rax, %rax
> shrq $32, %rax
>
> The smaller multiplier fits imulq's 32-bit signed immediate, saving
> the separate movl, and the post-shift drops from 35 to 32. Similar
> effects for /3, /7, /100. For signed /7, codegen also shifts from
> the long path (addl-fixup) to the short path, reducing seven
> instructions to five.
>
> The new parameter defaults to -1 and produces baseline codegen for
> that case; functions without VRP range info are unaffected.
>
> gcc.target/i386/pr115910.c is updated. The test has two functions
> both doing x / 3U: foo with __builtin_unreachable when x < 0, and
> bar with no hint. Previously both produced identical codegen; the
> test asserted 2 imulq + 2 shrq $33. With this patch foo's
> __builtin_unreachable narrows x to [0, INT_MAX] giving VRP-derived
> precision 31, so choose_multiplier picks a smaller M (1431655766,
> fits imulq's 32-bit immediate) and post-shift 32 instead of 33.
> bar is unchanged. The dg-final assertions are split: still 2 imulq,
> plus 1 shrq $32 (foo) and 1 shrq $33 (bar).
>
> One drawback of this patch is some code bloat: the same
> divisor at two call sites with different ranges produces different
> assembly instructions. Unmodified GCC produces the same assembly
> instructions regardless of range, allowing the linker's identical-
> code-folding (ICF) pass to merge the two function bodies into a
> single copy in the final binary. With this patch the bodies differ,
> so ICF cannot fold them. Clang has the same drawback: it also
> produces different assembly instructions for the same divisor at
> different ranges, which ICF cannot fold.
>
> For example, consider /3 at two different ranges:
>
> unsigned int f3 (unsigned int a) {
> if (a > 65535) __builtin_unreachable ();
> return a / 3;
> }
> unsigned int f (unsigned int a) {
> if (a > 655356) __builtin_unreachable ();
> return a / 3;
> }
>
> Unmodified GCC produces byte-identical assembly instructions for
> both functions, which ICF folds into a single copy:
> f3:
> mov eax, edi
> mov edx, 2863311531
> imul rax, rdx
> shr rax, 33
> ret
> f:
> mov eax, edi
> mov edx, 2863311531
> imul rax, rdx
> shr rax, 33
> ret
>
> Clang produces different assembly instructions; ICF cannot fold them:
> f3:
> imul eax, edi, 43691
> shr eax, 17
> ret
> f:
> mov ecx, edi
> mov eax, 2863311531
> imul rax, rcx
> shr rax, 33
> ret
>
> Patched GCC produces different assembly instructions; ICF cannot
> fold them:
> f3:
> mov eax, edi
> imul rax, rax, 1431677610
> shr rax, 32
> ret
> f:
> mov eax, edi
> imul rax, rax, 1431657130
> shr rax, 32
> ret
>
> A live demonstration: https://godbolt.org/z/75r6axnvG, compiled with
>
> -O2 -ffunction-sections -fuse-ld=gold -Wl,--icf=all
>
> The flags:
> -ffunction-sections put each function into its own section, so
> the linker can compare and fold them
> individually
> -fuse-ld=gold use the gold linker (the default linker
> does not support ICF)
> -Wl,--icf=all enable ICF in the linker
>
> Under unmodified GCC, f3 and f end up at the same address. Under
> clang and patched GCC, they end up at different addresses.
>
> Under default compiler/linker flags this patch likely has no meaningful
> downside: ICF is not enabled by default, so unmodified and patched
> both keep two copies of the body in .text. We consider this an
> acceptable tradeoff: each individual division is at least as small
> as before, and clang accepts the same one.
>
> Bootstrapped on x86_64-pc-linux-gnu with the default language set
> (c, c++, fortran, objc, obj-c++, lto): "make bootstrap" followed by
> "make -k check" from the top of the build tree. Stage-2 vs. stage-3
> binary comparison successful (the two compilers GCC produces of itself
> are byte-identical). No new regressions in the testsuite, and all 30
> dg-final assertions across the seven new gcc.target/i386/divmod-range-*.c
> tests pass.
>
> gcc/ChangeLog:
>
> PR target/91883
> * expmed.cc (choose_multiplier): Prepend comment stating the
> multiply-shift formula it computes.
> (adjusted_dividend_prec): New helper.
> (expand_divmod): Add dividend_prec parameter. Forward to
> adjusted_dividend_prec at the unsigned TRUNC, signed TRUNC,
> and signed FLOOR XOR-sign-flip call sites.
> * expmed.h (expand_divmod): Add dividend_prec default arg.
> * expr.cc (determine_value_range): New helper, queries
> range_query during pass_expand.
> (expand_expr_divmod): Compute dividend_prec from treeop0's
> range and forward to expand_divmod.
>
> gcc/testsuite/ChangeLog:
>
> PR target/91883
> * gcc.target/i386/pr115910.c: Adjust shrq assertions for
> VRP-narrowed multiplier on foo.
> * gcc.target/i386/divmod-range-1.c: New test: VRP-narrowed
> multiplier for unsigned /10.
> * gcc.target/i386/divmod-range-2.c: New test: VRP-narrowed
> multiplier for unsigned %10.
> * gcc.target/i386/divmod-range-3.c: New test: VRP-narrowed
> multiplier for signed /10.
> * gcc.target/i386/divmod-range-4.c: New test: VRP shifts
> signed /7 from the long (addl-fixup) path to the short path.
> * gcc.target/i386/divmod-range-5.c: New test: regression
> guard -- no-hint unsigned divmod preserves baseline codegen.
> * gcc.target/i386/divmod-range-6.c: New test: regression
> guard -- no-hint signed /7 preserves long-path codegen.
> * gcc.target/i386/divmod-range-7.c: New test: VRP-narrowed
> multiplier for signed FLOOR_DIV (uses GIMPLE Front End since
> FLOOR_DIV_EXPR cannot be produced from C source).
So first a non-technical issue. I'm not aware of a DCO or copyright
assignment on file for you. So you'll need to either add a
Signed-off-by tag to put the contribution under the DCO or execute a
copyright assignment for this work to the FSF.
I probably wouldn't use the term "hint", but instead "range". When I
think of a hint I tend to think that that getting it wrong isn't
catastrophic from a correctness standpoint. But given how you're using
the range information, if we get it wrong, then the result will be wrong.
You changed the signature for expand_divmod, then used a default value
for that new parameter. I'd generally prefer to avoid default values
like that. There aren't that many calls to expand_divmod, I'd prefer to
just go fix them.
For a patch like this you will need to bootstrap the compiler and run a
full regression test. You may have done that already, if so just
indicate what platform you did that step on. I'd guess x86_64 linux,
but best to be explicit about the testing.
> @@ -9775,6 +9806,35 @@ expand_expr_divmod (tree_code code, machine_mode mode,
> tree treeop0,
> {
> bool mod_p = (code == TRUNC_MOD_EXPR || code == FLOOR_MOD_EXPR
> || code == CEIL_MOD_EXPR || code == ROUND_MOD_EXPR);
> +
> + /* Calculate precision hint from value range if available. */
> + int dividend_prec = -1;
> +
> + if (SCALAR_INT_MODE_P (mode)
> + && optimize >= 2
> + && TREE_CODE (treeop1) == INTEGER_CST)
> + {
> + wide_int min_val, max_val;
> + if (determine_value_range (treeop0, &min_val, &max_val))
> + {
> + if (unsignedp || wi::ges_p (min_val, 0))
> + {
> + /* Unsigned or known non-negative: precision from upper bound. */
> + dividend_prec = wi::min_precision (max_val, UNSIGNED);
> + }
> + else
> + {
> + /* Signed with possible negative values. Take the unsigned
> + precision of the larger of -min - 1 and max (or just
> + -min - 1 when max < 0, since -min - 1 dominates then). */
> + wide_int neg_side = -min_val - 1;
Is there any chance of overflow here? -MIN_INT is usually problematical
and subtracting 1 from that result just makes overflow more worrisome.
In general it looks pretty reasonable. Fix the stuff above and I'll get
it committed if nobody objects.
Thanks!
jeff