On Mon, May 11, 2026 at 10:46 AM Roger Sayle <[email protected]> wrote:
>
>
> This patch is my (initial) solution to PR tree-optimization/112508, the
> observation
> that tree-ssa's loop store motion frequently increases the size of a
> function and
> therefore plays poorly with -Os and -Oz. There are two challenges that
> complicate
> things, and prevent simply disabling this pass (equivalent to
> -fno-move-loop-stores)
> for being an ideal solution. There first is that store motion is not
> universally
> bad, sometimes it helps, but often it doesn't. The second challenge is that
> this
> pass also performs analyses and other invariant motion that helps later
> passes.
>
> To demonstrate this delicate balance consider the following two (tiny) loops
> inspired by CSiBE's linux kernel benchmark (which aren't equivalent).
>
> void loop1 (short i) {
> for (;i;i--)
> _wdtc.bit.WTE = 0;
> }
>
> void loop2 (short i) {
> do {
> _wdtc.bit.WTE = 0;
> } while (--i);
> }
>
> Currently on x86_64, with -Os loop1 is 25 bytes and loop2 is 8 bytes.
> Adding the
> -fno-move-loop-stores flag decreases loop1 to 17 bytes, but increases loop2
> to 13
> bytes. Without store motion we fail to eliminate the loop.
>
> The correct solution is to intelligently determine whether a particular loop
> store
> motion is space saving or not. This patch adds an extra clause to the
> predicate
> can_sm_ref_p when optimizing the function for size, to restrict store motion
> to
> unconditional stores in single exit loops. Moving a store by duplicating it
> on
> multiple loop exit edges can obviously increase size. Likewise, the
> additional
> logic (and flag variable) for when a store is conditionally executed (i.e.
> only
> executed sometimes) requires extra instructions not present in the original
> code.
>
> With this patch, using just -Os, loop1 above is 17 bytes and loop2 is 8
> bytes
> (i.e. the best of both worlds).
>
> Importantly, this change gives the store motion pass some logic that can be
> tweaked and refined in future, if examples requiring more complex decision
> making are discovered. For example, when optimize_loop_for_size returns -Oz
> (and the enclosing function or loop is less aggressively optimized for size)
> could potentially be interpreted as a hint to always perform store motion,
> minimizing the size of the loop body, even at the expense of larger total
> code
> size. Some comments in the PR mention hot vs. cold basic blocks, but this
> only affects performance, and isn't relevant for -Os, i.e. (total) code
> size.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32} with
> no new failures. Ok for mainline?
+ /* Store motion decreases the size of the loop, but often increases
+ the size of the function. If optimizing the function for size,
+ be careful about which REFs to move. */
+ if (optimize_function_for_size_p (cfun))
+ {
can you use optimize_loop_nest_for_size_p (loop) here?
Please also cache ref_always_accessed_p, it's a quite expensive
walk over all accesses.
I think the patch is OK with those two adjustments.
I suppose one should weight the number of exits against the
number of accesses in the loop - those get replaced by
reg-reg copies. For conditional accesses in the loop there's
the opportunity to if-convert some blocks - unsure if that
would save code-size though.
One of the usual complaints with store motion is the
effect on register pressure and spilling that's eventually
caused. A first step to address this would be to rank
store motion candidates based on (weighted?) number of
loads/stores eliminated, so one can still move the first N
important candidates.
Richard.
>
> 2026-05-11 Roger Sayle <[email protected]>
>
> gcc/ChangeLog
> PR tree-optimization/112508
> * tree-ssa-loop-im.cc (can_sm_ref_p): When optimizing for size, only
> move
> unconditional stores from loops with a single exit.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/pr112508-1.c: New test case.
> * gcc.target/i386/pr112508-2.c: Likewise.
>
>
> Thanks in advance,
> Roger
> --
>