Hi gcc-patches mailing list,
Christopher Bazley via Sourceware Forge 
<[email protected]> has requested that the 
following forgejo pull request
be published on the mailing list.

Created on: 2026-05-11 16:58:20+00:00
Latest update: 2026-05-19 15:02:12+00:00
Changes: 41 changed files, 136 additions, 3 deletions
Head revision: chris.bazley/gcc ref aarch64_unroll_small_loops_by_default2 
commit 8718254432487b4366a40512e38fabe77a51b3f8
Base revision: gcc/gcc-TEST ref trunk commit 
03bf757085091d95a9fe4ab964ee62da157ef563 r17-473-g03bf757085091d
Merge base: 03bf757085091d95a9fe4ab964ee62da157ef563
Full diff url: https://forge.sourceware.org/gcc/gcc-TEST/pulls/155.diff
Discussion:  https://forge.sourceware.org/gcc/gcc-TEST/pulls/155
Requested Reviewers:

Benchmarking has shown some worthwhile improvements as a result of
unrolling small loops (<= 4 instructions) by the smallest possible
unroll factor (2) in the RTL loop unroller pass.  In preparation
for enabling loop unrolling for at least some AArch64 targets, this
commit implements the TARGET_LOOP_UNROLL_ADJUST hook with a similar
structure to the implementation of the equivalent function in other
backends.

Support is also added for an undocumented command line option,
-munroll-only-small-loops, that is already supported by several
other backends.  In the short term, the only effect of this option
on the AArch64 backend will be to suppress loop unrolling: a
max_small_unroll_ninsns member added to the tuning parameters
structure is initialized to zero for every type of CPU, which
means that no loop is considered small enough to unroll.

The combination of -funroll-loops and -munroll-only-small-loops
is now enabled by default at optimization level -O2 and above,
if debugging and optimization for code size are disabled.

Normally, enabling -funroll-loops would implicitly enable
-frename-registers ("perform a register renaming optimization
pass") and -fweb ("construct webs and split unrelated uses of
single variable").  Like the i386 and rs6000 backends, the AArch64
backend now prevents implicit enablement of -frename-registers by
disabling it by default at all optimization levels; unlike i386,
the AArch64 backend does *not* also disable -fweb, which is
therefore now implicitly enabled by default at -O2 and above.
This choice is based on benchmark results on Neoverse V3.

If -funroll-loops or -funroll-all-loops were specified when
invoking GCC (without explicit -munroll-only-small-loops or
-mno-unroll-only-small-loops) then -munroll-only-small-loops is
disabled by aarch64_override_options_after_change_1 (in
TARGET_OPTION_OVERRIDE and many other hooks).  Similarly, if the
user requested unrolling without explicit -frename-registers or
-fno-rename-registers then -frename-registers is automatically
re-enabled.  The intent is to keep GCC's behavior unchanged for
existing users of -funroll-loops and -funroll-all-loops.

These preparatory changes are not expected to cause unexpected
unrolling because no loop of more than zero insns is considered
small enough to unroll.  It is still possible to explicitly request
unrolling by a loop-specific factor at any optimization level, by
putting '#pragma GCC unroll' in the source code.

If loop unrolling is explicitly enabled at a lower optimization level
then it can now be suppressed by invoking GCC with the new
-munroll-only-small-loops option (e.g., -O1 -funroll-loops
-munroll-only-small-loops); however, that combination of options is
not guaranteed to disable loop unrolling in the long term, because it
depends on the tuning parameters for the selected target.

gcc/ChangeLog:

        * common/config/aarch64/aarch64-common.cc:
        Enable -munroll-only-small-loops with -funroll-loops in
        aarch_option_optimization_table to unroll small loops at
        -O2 and above by default, unless debugging or optimization
        for size is enabled.
        Disable -frename-registers by default at all optimization
        levels.
        * config/aarch64/aarch64-json-schema.h:
        Add max_small_unroll_ninsns and max_small_unroll_factor.
        * config/aarch64/aarch64-json-tunings-parser-generated.inc:
        Regenerate.
        * config/aarch64/aarch64-json-tunings-printer-generated.inc:
        Regenerate.
        * config/aarch64/aarch64-protos.h (struct tune_params):
        New members: max_small_unroll_ninsns and
        max_small_unroll_factor to configure which loops are
        considered 'small' and the maximum amount by which to allow
        such loops to be unrolled.
        * config/aarch64/aarch64.cc (aarch64_override_options_after_change_1):
        Disable the restriction whereby only 'small' loops are unrolled
        and enable register renaming if the user explicitly specified
        -funroll-loops or -funroll-all-loops but did not explicitly
        specify the value of dependent options.
        (aarch64_override_options_internal): Pass opts_set through to
        aarch64_override_options_after_change_1 as a new argument.
        (aarch64_override_options_after_change): Pass &global_options_set
        as a new argument to aarch64_override_options_after_change_1.
        (aarch64_loop_unroll_adjust): Implement the target hook
        TARGET_LOOP_UNROLL_ADJUST.
        (TARGET_LOOP_UNROLL_ADJUST): Define macro as
        aarch64_loop_unroll_adjust to enable target hook.
        * config/aarch64/aarch64.opt:
        Add -munroll-only-small-loops as a new undocumented option.
        * config/aarch64/tuning_models/a64fx.h:
        Initialize max_small_unroll_ninsns and
        max_small_unroll_factor to zero so that no loops are
        considered small enough to unroll in this tuning model.
        * config/aarch64/tuning_models/ampere1.h: As above.
        * config/aarch64/tuning_models/ampere1a.h: As above.
        * config/aarch64/tuning_models/ampere1b.h: As above.
        * config/aarch64/tuning_models/cortexa35.h: As above.
        * config/aarch64/tuning_models/cortexa53.h: As above.
        * config/aarch64/tuning_models/cortexa57.h: As above.
        * config/aarch64/tuning_models/cortexa72.h: As above.
        * config/aarch64/tuning_models/cortexa73.h: As above.
        * config/aarch64/tuning_models/cortexx925.h: As above.
        * config/aarch64/tuning_models/emag.h: As above.
        * config/aarch64/tuning_models/exynosm1.h: As above.
        * config/aarch64/tuning_models/fujitsu_monaka.h: As above.
        * config/aarch64/tuning_models/generic.h: As above.
        * config/aarch64/tuning_models/generic_armv8_a.h: As above.
        * config/aarch64/tuning_models/generic_armv9_a.h: As above.
        * config/aarch64/tuning_models/hip12.h: As above.
        * config/aarch64/tuning_models/neoverse512tvb.h: As above.
        * config/aarch64/tuning_models/neoversen1.h: As above.
        * config/aarch64/tuning_models/neoversen2.h: As above.
        * config/aarch64/tuning_models/neoversen3.h: As above.
        * config/aarch64/tuning_models/neoversev1.h: As above.
        * config/aarch64/tuning_models/neoversev2.h: As above.
        * config/aarch64/tuning_models/neoversev3.h: As above.
        * config/aarch64/tuning_models/neoversev3ae.h: As above.
        * config/aarch64/tuning_models/olympus.h: As above.
        * config/aarch64/tuning_models/qdf24xx.h: As above.
        * config/aarch64/tuning_models/saphira.h: As above.
        * config/aarch64/tuning_models/thunderx.h: As above.
        * config/aarch64/tuning_models/thunderx2t99.h: As above.
        * config/aarch64/tuning_models/thunderx3t110.h: As above.
        * config/aarch64/tuning_models/thunderxt88.h: As above.
        * config/aarch64/tuning_models/tsv110.h: As above.
        * config/aarch64/tuning_models/xgene1.h: As above.


Changed files:
- M: gcc/common/config/aarch64/aarch64-common.cc
- M: gcc/config/aarch64/aarch64-json-schema.h
- M: gcc/config/aarch64/aarch64-json-tunings-parser-generated.inc
- M: gcc/config/aarch64/aarch64-json-tunings-printer-generated.inc
- M: gcc/config/aarch64/aarch64-protos.h
- M: gcc/config/aarch64/aarch64.cc
- M: gcc/config/aarch64/aarch64.opt
- M: gcc/config/aarch64/tuning_models/a64fx.h
- M: gcc/config/aarch64/tuning_models/ampere1.h
- M: gcc/config/aarch64/tuning_models/ampere1a.h
- M: gcc/config/aarch64/tuning_models/ampere1b.h
- M: gcc/config/aarch64/tuning_models/cortexa35.h
- M: gcc/config/aarch64/tuning_models/cortexa53.h
- M: gcc/config/aarch64/tuning_models/cortexa57.h
- M: gcc/config/aarch64/tuning_models/cortexa72.h
- M: gcc/config/aarch64/tuning_models/cortexa73.h
- M: gcc/config/aarch64/tuning_models/cortexx925.h
- M: gcc/config/aarch64/tuning_models/emag.h
- M: gcc/config/aarch64/tuning_models/exynosm1.h
- M: gcc/config/aarch64/tuning_models/fujitsu_monaka.h
- M: gcc/config/aarch64/tuning_models/generic.h
- M: gcc/config/aarch64/tuning_models/generic_armv8_a.h
- M: gcc/config/aarch64/tuning_models/generic_armv9_a.h
- M: gcc/config/aarch64/tuning_models/hip12.h
- M: gcc/config/aarch64/tuning_models/neoverse512tvb.h
- M: gcc/config/aarch64/tuning_models/neoversen1.h
- M: gcc/config/aarch64/tuning_models/neoversen2.h
- M: gcc/config/aarch64/tuning_models/neoversen3.h
- M: gcc/config/aarch64/tuning_models/neoversev1.h
- M: gcc/config/aarch64/tuning_models/neoversev2.h
- M: gcc/config/aarch64/tuning_models/neoversev3.h
- M: gcc/config/aarch64/tuning_models/neoversev3ae.h
- M: gcc/config/aarch64/tuning_models/olympus.h
- M: gcc/config/aarch64/tuning_models/qdf24xx.h
- M: gcc/config/aarch64/tuning_models/saphira.h
- M: gcc/config/aarch64/tuning_models/thunderx.h
- M: gcc/config/aarch64/tuning_models/thunderx2t99.h
- M: gcc/config/aarch64/tuning_models/thunderx3t110.h
- M: gcc/config/aarch64/tuning_models/thunderxt88.h
- M: gcc/config/aarch64/tuning_models/tsv110.h
- M: gcc/config/aarch64/tuning_models/xgene1.h


Christopher Bazley (1):
  aarch64: Restrict unrolling to small loops by default

 gcc/common/config/aarch64/aarch64-common.cc   |  8 ++++
 gcc/config/aarch64/aarch64-json-schema.h      |  2 +
 .../aarch64-json-tunings-parser-generated.inc |  2 +
 ...aarch64-json-tunings-printer-generated.inc |  2 +
 gcc/config/aarch64/aarch64-protos.h           |  7 +++
 gcc/config/aarch64/aarch64.cc                 | 46 +++++++++++++++++--
 gcc/config/aarch64/aarch64.opt                |  4 ++
 gcc/config/aarch64/tuning_models/a64fx.h      |  2 +
 gcc/config/aarch64/tuning_models/ampere1.h    |  2 +
 gcc/config/aarch64/tuning_models/ampere1a.h   |  2 +
 gcc/config/aarch64/tuning_models/ampere1b.h   |  2 +
 gcc/config/aarch64/tuning_models/cortexa35.h  |  2 +
 gcc/config/aarch64/tuning_models/cortexa53.h  |  2 +
 gcc/config/aarch64/tuning_models/cortexa57.h  |  2 +
 gcc/config/aarch64/tuning_models/cortexa72.h  |  2 +
 gcc/config/aarch64/tuning_models/cortexa73.h  |  2 +
 gcc/config/aarch64/tuning_models/cortexx925.h |  2 +
 gcc/config/aarch64/tuning_models/emag.h       |  2 +
 gcc/config/aarch64/tuning_models/exynosm1.h   |  2 +
 .../aarch64/tuning_models/fujitsu_monaka.h    |  2 +
 gcc/config/aarch64/tuning_models/generic.h    |  2 +
 .../aarch64/tuning_models/generic_armv8_a.h   |  2 +
 .../aarch64/tuning_models/generic_armv9_a.h   |  2 +
 gcc/config/aarch64/tuning_models/hip12.h      |  2 +
 .../aarch64/tuning_models/neoverse512tvb.h    |  2 +
 gcc/config/aarch64/tuning_models/neoversen1.h |  2 +
 gcc/config/aarch64/tuning_models/neoversen2.h |  2 +
 gcc/config/aarch64/tuning_models/neoversen3.h |  2 +
 gcc/config/aarch64/tuning_models/neoversev1.h |  2 +
 gcc/config/aarch64/tuning_models/neoversev2.h |  2 +
 gcc/config/aarch64/tuning_models/neoversev3.h |  2 +
 .../aarch64/tuning_models/neoversev3ae.h      |  2 +
 gcc/config/aarch64/tuning_models/olympus.h    |  2 +
 gcc/config/aarch64/tuning_models/qdf24xx.h    |  2 +
 gcc/config/aarch64/tuning_models/saphira.h    |  2 +
 gcc/config/aarch64/tuning_models/thunderx.h   |  2 +
 .../aarch64/tuning_models/thunderx2t99.h      |  2 +
 .../aarch64/tuning_models/thunderx3t110.h     |  2 +
 .../aarch64/tuning_models/thunderxt88.h       |  2 +
 gcc/config/aarch64/tuning_models/tsv110.h     |  2 +
 gcc/config/aarch64/tuning_models/xgene1.h     |  2 +
 41 files changed, 136 insertions(+), 3 deletions(-)

Range-diff against v1:
1:  a4009d108bc4 ! 1:  871825443248 aarch64: Restrict unrolling to small loops 
by default
    @@ Commit message
     
         Support is also added for an undocumented command line option,
         -munroll-only-small-loops, that is already supported by several
    -    other backends. In the short term, the only effect of this option
    +    other backends.  In the short term, the only effect of this option
         on the AArch64 backend will be to suppress loop unrolling: a
         max_small_unroll_ninsns member added to the tuning parameters
         structure is initialized to zero for every type of CPU, which
    @@ Commit message
         is now enabled by default at optimization level -O2 and above,
         if debugging and optimization for code size are disabled.
     
    -    Restricting loop unrolling to small loops by default at higher
    -    optimization levels is already the policy of several other backends.
    -
    -    Unlike the i386 backend, -frename-registers and -fweb (implicitly
    -    enabled by -funroll-loops) are not disabled by default at all
    -    optimization levels; nor is their value overridden by any of the
    -    hooks TARGET_OPTION_OVERRIDE, TARGET_OPTION_RESTORE,
    -    TARGET_SET_CURRENT_FUNCTION or TARGET_OPTION_VALID_ATTRIBUTE_P.
    -    The reason is that one benchmark showed a significant performance
    -    improvement with those options enabled.
    -
    -    These preparatory changes are not expected to cause loop unrolling
    -    to be implicitly enabled for existing programs because no loop of
    -    more than zero insns is considered small enough to unroll; nor do
    -    they prevent loop unrolling when explicitly requested by a user who
    -    specifies -funroll-loops or -funroll-all-loops when invoking GCC.
    -
    -    It will still be possible for programmers to explicitly request loop
    -    unrolling by a user-specified loop-specific factor at any
    -    optimization level, by putting '#pragma GCC unroll' in the source
    -    code.
    +    Normally, enabling -funroll-loops would implicitly enable
    +    -frename-registers ("perform a register renaming optimization
    +    pass") and -fweb ("construct webs and split unrelated uses of
    +    single variable").  Like the i386 and rs6000 backends, the AArch64
    +    backend now prevents implicit enablement of -frename-registers by
    +    disabling it by default at all optimization levels; unlike i386,
    +    the AArch64 backend does *not* also disable -fweb, which is
    +    therefore now implicitly enabled by default at -O2 and above.
    +    This choice is based on benchmark results on Neoverse V3.
    +
    +    If -funroll-loops or -funroll-all-loops were specified when
    +    invoking GCC (without explicit -munroll-only-small-loops or
    +    -mno-unroll-only-small-loops) then -munroll-only-small-loops is
    +    disabled by aarch64_override_options_after_change_1 (in
    +    TARGET_OPTION_OVERRIDE and many other hooks).  Similarly, if the
    +    user requested unrolling without explicit -frename-registers or
    +    -fno-rename-registers then -frename-registers is automatically
    +    re-enabled.  The intent is to keep GCC's behavior unchanged for
    +    existing users of -funroll-loops and -funroll-all-loops.
    +
    +    These preparatory changes are not expected to cause unexpected
    +    unrolling because no loop of more than zero insns is considered
    +    small enough to unroll.  It is still possible to explicitly request
    +    unrolling by a loop-specific factor at any optimization level, by
    +    putting '#pragma GCC unroll' in the source code.
     
         If loop unrolling is explicitly enabled at a lower optimization level
         then it can now be suppressed by invoking GCC with the new
    @@ Commit message
                 aarch_option_optimization_table to unroll small loops at
                 -O2 and above by default, unless debugging or optimization
                 for size is enabled.
    +            Disable -frename-registers by default at all optimization
    +            levels.
                 * config/aarch64/aarch64-json-schema.h:
                 Add max_small_unroll_ninsns and max_small_unroll_factor.
                 * config/aarch64/aarch64-json-tunings-parser-generated.inc:
    @@ Commit message
                 such loops to be unrolled.
                 * config/aarch64/aarch64.cc 
(aarch64_override_options_after_change_1):
                 Disable the restriction whereby only 'small' loops are unrolled
    -            if the user explicitly specified -funroll-loops or 
-funroll-all-loops.
    +            and enable register renaming if the user explicitly specified
    +            -funroll-loops or -funroll-all-loops but did not explicitly
    +            specify the value of dependent options.
                 (aarch64_override_options_internal): Pass opts_set through to
                 aarch64_override_options_after_change_1 as a new argument.
                 (aarch64_override_options_after_change): Pass 
&global_options_set
    @@ gcc/common/config/aarch64/aarch64-common.cc: static const struct 
default_options
     +
     +    /* Enable -munroll-only-small-loops with -funroll-loops to unroll 
small
     +       loops at -O2 and above by default.  */
    -+    {OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1},
    -+    {OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1},
    ++    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
    ++    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 
},
    ++    /* Turns off -frename-registers which is enabled by -funroll-loops.  
*/
    ++    { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
     +
          { OPT_LEVELS_2_PLUS, OPT_mearly_ra_, NULL, AARCH64_EARLY_RA_ALL },
      #if (TARGET_DEFAULT_ASYNC_UNWIND_TABLES == 1)
    @@ gcc/config/aarch64/aarch64.cc: aarch64_override_options_after_change_1 
(struct g
        if (flag_mlow_precision_sqrt)
          flag_mrecip_low_precision_sqrt = true;
     +
    -+  /* Disable a restriction that limits unrolling to small loops
    -+     when there's an explicit -funroll-loops or -funroll-all-loops.  */
    ++  /* Revert to the traditional behavior of -funroll-loops and 
-funroll-all-loops
    ++     if they were explictly specified by the user and the user did not 
override
    ++     the implied defaults.  */
     +  if ((opts_set->x_flag_unroll_loops && opts->x_flag_unroll_loops)
     +      || (opts_set->x_flag_unroll_all_loops && 
opts->x_flag_unroll_all_loops))
     +    {
     +      if (!opts_set->x_unroll_only_small_loops)
     +  opts->x_unroll_only_small_loops = 0;
    ++
    ++      if (!opts_set->x_flag_rename_registers)
    ++  opts->x_flag_rename_registers = 1;
     +    }
      }
      
-- 
2.54.0

Reply via email to