Hi gcc-patches mailing list, Christopher Bazley via Sourceware Forge <[email protected]> has requested that the following forgejo pull request be published on the mailing list.
Created on: 2026-05-11 16:58:20+00:00 Latest update: 2026-05-19 15:02:12+00:00 Changes: 41 changed files, 136 additions, 3 deletions Head revision: chris.bazley/gcc ref aarch64_unroll_small_loops_by_default2 commit 8718254432487b4366a40512e38fabe77a51b3f8 Base revision: gcc/gcc-TEST ref trunk commit 03bf757085091d95a9fe4ab964ee62da157ef563 r17-473-g03bf757085091d Merge base: 03bf757085091d95a9fe4ab964ee62da157ef563 Full diff url: https://forge.sourceware.org/gcc/gcc-TEST/pulls/155.diff Discussion: https://forge.sourceware.org/gcc/gcc-TEST/pulls/155 Requested Reviewers: Benchmarking has shown some worthwhile improvements as a result of unrolling small loops (<= 4 instructions) by the smallest possible unroll factor (2) in the RTL loop unroller pass. In preparation for enabling loop unrolling for at least some AArch64 targets, this commit implements the TARGET_LOOP_UNROLL_ADJUST hook with a similar structure to the implementation of the equivalent function in other backends. Support is also added for an undocumented command line option, -munroll-only-small-loops, that is already supported by several other backends. In the short term, the only effect of this option on the AArch64 backend will be to suppress loop unrolling: a max_small_unroll_ninsns member added to the tuning parameters structure is initialized to zero for every type of CPU, which means that no loop is considered small enough to unroll. The combination of -funroll-loops and -munroll-only-small-loops is now enabled by default at optimization level -O2 and above, if debugging and optimization for code size are disabled. Normally, enabling -funroll-loops would implicitly enable -frename-registers ("perform a register renaming optimization pass") and -fweb ("construct webs and split unrelated uses of single variable"). Like the i386 and rs6000 backends, the AArch64 backend now prevents implicit enablement of -frename-registers by disabling it by default at all optimization levels; unlike i386, the AArch64 backend does *not* also disable -fweb, which is therefore now implicitly enabled by default at -O2 and above. This choice is based on benchmark results on Neoverse V3. If -funroll-loops or -funroll-all-loops were specified when invoking GCC (without explicit -munroll-only-small-loops or -mno-unroll-only-small-loops) then -munroll-only-small-loops is disabled by aarch64_override_options_after_change_1 (in TARGET_OPTION_OVERRIDE and many other hooks). Similarly, if the user requested unrolling without explicit -frename-registers or -fno-rename-registers then -frename-registers is automatically re-enabled. The intent is to keep GCC's behavior unchanged for existing users of -funroll-loops and -funroll-all-loops. These preparatory changes are not expected to cause unexpected unrolling because no loop of more than zero insns is considered small enough to unroll. It is still possible to explicitly request unrolling by a loop-specific factor at any optimization level, by putting '#pragma GCC unroll' in the source code. If loop unrolling is explicitly enabled at a lower optimization level then it can now be suppressed by invoking GCC with the new -munroll-only-small-loops option (e.g., -O1 -funroll-loops -munroll-only-small-loops); however, that combination of options is not guaranteed to disable loop unrolling in the long term, because it depends on the tuning parameters for the selected target. gcc/ChangeLog: * common/config/aarch64/aarch64-common.cc: Enable -munroll-only-small-loops with -funroll-loops in aarch_option_optimization_table to unroll small loops at -O2 and above by default, unless debugging or optimization for size is enabled. Disable -frename-registers by default at all optimization levels. * config/aarch64/aarch64-json-schema.h: Add max_small_unroll_ninsns and max_small_unroll_factor. * config/aarch64/aarch64-json-tunings-parser-generated.inc: Regenerate. * config/aarch64/aarch64-json-tunings-printer-generated.inc: Regenerate. * config/aarch64/aarch64-protos.h (struct tune_params): New members: max_small_unroll_ninsns and max_small_unroll_factor to configure which loops are considered 'small' and the maximum amount by which to allow such loops to be unrolled. * config/aarch64/aarch64.cc (aarch64_override_options_after_change_1): Disable the restriction whereby only 'small' loops are unrolled and enable register renaming if the user explicitly specified -funroll-loops or -funroll-all-loops but did not explicitly specify the value of dependent options. (aarch64_override_options_internal): Pass opts_set through to aarch64_override_options_after_change_1 as a new argument. (aarch64_override_options_after_change): Pass &global_options_set as a new argument to aarch64_override_options_after_change_1. (aarch64_loop_unroll_adjust): Implement the target hook TARGET_LOOP_UNROLL_ADJUST. (TARGET_LOOP_UNROLL_ADJUST): Define macro as aarch64_loop_unroll_adjust to enable target hook. * config/aarch64/aarch64.opt: Add -munroll-only-small-loops as a new undocumented option. * config/aarch64/tuning_models/a64fx.h: Initialize max_small_unroll_ninsns and max_small_unroll_factor to zero so that no loops are considered small enough to unroll in this tuning model. * config/aarch64/tuning_models/ampere1.h: As above. * config/aarch64/tuning_models/ampere1a.h: As above. * config/aarch64/tuning_models/ampere1b.h: As above. * config/aarch64/tuning_models/cortexa35.h: As above. * config/aarch64/tuning_models/cortexa53.h: As above. * config/aarch64/tuning_models/cortexa57.h: As above. * config/aarch64/tuning_models/cortexa72.h: As above. * config/aarch64/tuning_models/cortexa73.h: As above. * config/aarch64/tuning_models/cortexx925.h: As above. * config/aarch64/tuning_models/emag.h: As above. * config/aarch64/tuning_models/exynosm1.h: As above. * config/aarch64/tuning_models/fujitsu_monaka.h: As above. * config/aarch64/tuning_models/generic.h: As above. * config/aarch64/tuning_models/generic_armv8_a.h: As above. * config/aarch64/tuning_models/generic_armv9_a.h: As above. * config/aarch64/tuning_models/hip12.h: As above. * config/aarch64/tuning_models/neoverse512tvb.h: As above. * config/aarch64/tuning_models/neoversen1.h: As above. * config/aarch64/tuning_models/neoversen2.h: As above. * config/aarch64/tuning_models/neoversen3.h: As above. * config/aarch64/tuning_models/neoversev1.h: As above. * config/aarch64/tuning_models/neoversev2.h: As above. * config/aarch64/tuning_models/neoversev3.h: As above. * config/aarch64/tuning_models/neoversev3ae.h: As above. * config/aarch64/tuning_models/olympus.h: As above. * config/aarch64/tuning_models/qdf24xx.h: As above. * config/aarch64/tuning_models/saphira.h: As above. * config/aarch64/tuning_models/thunderx.h: As above. * config/aarch64/tuning_models/thunderx2t99.h: As above. * config/aarch64/tuning_models/thunderx3t110.h: As above. * config/aarch64/tuning_models/thunderxt88.h: As above. * config/aarch64/tuning_models/tsv110.h: As above. * config/aarch64/tuning_models/xgene1.h: As above. Changed files: - M: gcc/common/config/aarch64/aarch64-common.cc - M: gcc/config/aarch64/aarch64-json-schema.h - M: gcc/config/aarch64/aarch64-json-tunings-parser-generated.inc - M: gcc/config/aarch64/aarch64-json-tunings-printer-generated.inc - M: gcc/config/aarch64/aarch64-protos.h - M: gcc/config/aarch64/aarch64.cc - M: gcc/config/aarch64/aarch64.opt - M: gcc/config/aarch64/tuning_models/a64fx.h - M: gcc/config/aarch64/tuning_models/ampere1.h - M: gcc/config/aarch64/tuning_models/ampere1a.h - M: gcc/config/aarch64/tuning_models/ampere1b.h - M: gcc/config/aarch64/tuning_models/cortexa35.h - M: gcc/config/aarch64/tuning_models/cortexa53.h - M: gcc/config/aarch64/tuning_models/cortexa57.h - M: gcc/config/aarch64/tuning_models/cortexa72.h - M: gcc/config/aarch64/tuning_models/cortexa73.h - M: gcc/config/aarch64/tuning_models/cortexx925.h - M: gcc/config/aarch64/tuning_models/emag.h - M: gcc/config/aarch64/tuning_models/exynosm1.h - M: gcc/config/aarch64/tuning_models/fujitsu_monaka.h - M: gcc/config/aarch64/tuning_models/generic.h - M: gcc/config/aarch64/tuning_models/generic_armv8_a.h - M: gcc/config/aarch64/tuning_models/generic_armv9_a.h - M: gcc/config/aarch64/tuning_models/hip12.h - M: gcc/config/aarch64/tuning_models/neoverse512tvb.h - M: gcc/config/aarch64/tuning_models/neoversen1.h - M: gcc/config/aarch64/tuning_models/neoversen2.h - M: gcc/config/aarch64/tuning_models/neoversen3.h - M: gcc/config/aarch64/tuning_models/neoversev1.h - M: gcc/config/aarch64/tuning_models/neoversev2.h - M: gcc/config/aarch64/tuning_models/neoversev3.h - M: gcc/config/aarch64/tuning_models/neoversev3ae.h - M: gcc/config/aarch64/tuning_models/olympus.h - M: gcc/config/aarch64/tuning_models/qdf24xx.h - M: gcc/config/aarch64/tuning_models/saphira.h - M: gcc/config/aarch64/tuning_models/thunderx.h - M: gcc/config/aarch64/tuning_models/thunderx2t99.h - M: gcc/config/aarch64/tuning_models/thunderx3t110.h - M: gcc/config/aarch64/tuning_models/thunderxt88.h - M: gcc/config/aarch64/tuning_models/tsv110.h - M: gcc/config/aarch64/tuning_models/xgene1.h Christopher Bazley (1): aarch64: Restrict unrolling to small loops by default gcc/common/config/aarch64/aarch64-common.cc | 8 ++++ gcc/config/aarch64/aarch64-json-schema.h | 2 + .../aarch64-json-tunings-parser-generated.inc | 2 + ...aarch64-json-tunings-printer-generated.inc | 2 + gcc/config/aarch64/aarch64-protos.h | 7 +++ gcc/config/aarch64/aarch64.cc | 46 +++++++++++++++++-- gcc/config/aarch64/aarch64.opt | 4 ++ gcc/config/aarch64/tuning_models/a64fx.h | 2 + gcc/config/aarch64/tuning_models/ampere1.h | 2 + gcc/config/aarch64/tuning_models/ampere1a.h | 2 + gcc/config/aarch64/tuning_models/ampere1b.h | 2 + gcc/config/aarch64/tuning_models/cortexa35.h | 2 + gcc/config/aarch64/tuning_models/cortexa53.h | 2 + gcc/config/aarch64/tuning_models/cortexa57.h | 2 + gcc/config/aarch64/tuning_models/cortexa72.h | 2 + gcc/config/aarch64/tuning_models/cortexa73.h | 2 + gcc/config/aarch64/tuning_models/cortexx925.h | 2 + gcc/config/aarch64/tuning_models/emag.h | 2 + gcc/config/aarch64/tuning_models/exynosm1.h | 2 + .../aarch64/tuning_models/fujitsu_monaka.h | 2 + gcc/config/aarch64/tuning_models/generic.h | 2 + .../aarch64/tuning_models/generic_armv8_a.h | 2 + .../aarch64/tuning_models/generic_armv9_a.h | 2 + gcc/config/aarch64/tuning_models/hip12.h | 2 + .../aarch64/tuning_models/neoverse512tvb.h | 2 + gcc/config/aarch64/tuning_models/neoversen1.h | 2 + gcc/config/aarch64/tuning_models/neoversen2.h | 2 + gcc/config/aarch64/tuning_models/neoversen3.h | 2 + gcc/config/aarch64/tuning_models/neoversev1.h | 2 + gcc/config/aarch64/tuning_models/neoversev2.h | 2 + gcc/config/aarch64/tuning_models/neoversev3.h | 2 + .../aarch64/tuning_models/neoversev3ae.h | 2 + gcc/config/aarch64/tuning_models/olympus.h | 2 + gcc/config/aarch64/tuning_models/qdf24xx.h | 2 + gcc/config/aarch64/tuning_models/saphira.h | 2 + gcc/config/aarch64/tuning_models/thunderx.h | 2 + .../aarch64/tuning_models/thunderx2t99.h | 2 + .../aarch64/tuning_models/thunderx3t110.h | 2 + .../aarch64/tuning_models/thunderxt88.h | 2 + gcc/config/aarch64/tuning_models/tsv110.h | 2 + gcc/config/aarch64/tuning_models/xgene1.h | 2 + 41 files changed, 136 insertions(+), 3 deletions(-) Range-diff against v1: 1: a4009d108bc4 ! 1: 871825443248 aarch64: Restrict unrolling to small loops by default @@ Commit message Support is also added for an undocumented command line option, -munroll-only-small-loops, that is already supported by several - other backends. In the short term, the only effect of this option + other backends. In the short term, the only effect of this option on the AArch64 backend will be to suppress loop unrolling: a max_small_unroll_ninsns member added to the tuning parameters structure is initialized to zero for every type of CPU, which @@ Commit message is now enabled by default at optimization level -O2 and above, if debugging and optimization for code size are disabled. - Restricting loop unrolling to small loops by default at higher - optimization levels is already the policy of several other backends. - - Unlike the i386 backend, -frename-registers and -fweb (implicitly - enabled by -funroll-loops) are not disabled by default at all - optimization levels; nor is their value overridden by any of the - hooks TARGET_OPTION_OVERRIDE, TARGET_OPTION_RESTORE, - TARGET_SET_CURRENT_FUNCTION or TARGET_OPTION_VALID_ATTRIBUTE_P. - The reason is that one benchmark showed a significant performance - improvement with those options enabled. - - These preparatory changes are not expected to cause loop unrolling - to be implicitly enabled for existing programs because no loop of - more than zero insns is considered small enough to unroll; nor do - they prevent loop unrolling when explicitly requested by a user who - specifies -funroll-loops or -funroll-all-loops when invoking GCC. - - It will still be possible for programmers to explicitly request loop - unrolling by a user-specified loop-specific factor at any - optimization level, by putting '#pragma GCC unroll' in the source - code. + Normally, enabling -funroll-loops would implicitly enable + -frename-registers ("perform a register renaming optimization + pass") and -fweb ("construct webs and split unrelated uses of + single variable"). Like the i386 and rs6000 backends, the AArch64 + backend now prevents implicit enablement of -frename-registers by + disabling it by default at all optimization levels; unlike i386, + the AArch64 backend does *not* also disable -fweb, which is + therefore now implicitly enabled by default at -O2 and above. + This choice is based on benchmark results on Neoverse V3. + + If -funroll-loops or -funroll-all-loops were specified when + invoking GCC (without explicit -munroll-only-small-loops or + -mno-unroll-only-small-loops) then -munroll-only-small-loops is + disabled by aarch64_override_options_after_change_1 (in + TARGET_OPTION_OVERRIDE and many other hooks). Similarly, if the + user requested unrolling without explicit -frename-registers or + -fno-rename-registers then -frename-registers is automatically + re-enabled. The intent is to keep GCC's behavior unchanged for + existing users of -funroll-loops and -funroll-all-loops. + + These preparatory changes are not expected to cause unexpected + unrolling because no loop of more than zero insns is considered + small enough to unroll. It is still possible to explicitly request + unrolling by a loop-specific factor at any optimization level, by + putting '#pragma GCC unroll' in the source code. If loop unrolling is explicitly enabled at a lower optimization level then it can now be suppressed by invoking GCC with the new @@ Commit message aarch_option_optimization_table to unroll small loops at -O2 and above by default, unless debugging or optimization for size is enabled. + Disable -frename-registers by default at all optimization + levels. * config/aarch64/aarch64-json-schema.h: Add max_small_unroll_ninsns and max_small_unroll_factor. * config/aarch64/aarch64-json-tunings-parser-generated.inc: @@ Commit message such loops to be unrolled. * config/aarch64/aarch64.cc (aarch64_override_options_after_change_1): Disable the restriction whereby only 'small' loops are unrolled - if the user explicitly specified -funroll-loops or -funroll-all-loops. + and enable register renaming if the user explicitly specified + -funroll-loops or -funroll-all-loops but did not explicitly + specify the value of dependent options. (aarch64_override_options_internal): Pass opts_set through to aarch64_override_options_after_change_1 as a new argument. (aarch64_override_options_after_change): Pass &global_options_set @@ gcc/common/config/aarch64/aarch64-common.cc: static const struct default_options + + /* Enable -munroll-only-small-loops with -funroll-loops to unroll small + loops at -O2 and above by default. */ -+ {OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1}, -+ {OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1}, ++ { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, ++ { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 }, ++ /* Turns off -frename-registers which is enabled by -funroll-loops. */ ++ { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 }, + { OPT_LEVELS_2_PLUS, OPT_mearly_ra_, NULL, AARCH64_EARLY_RA_ALL }, #if (TARGET_DEFAULT_ASYNC_UNWIND_TABLES == 1) @@ gcc/config/aarch64/aarch64.cc: aarch64_override_options_after_change_1 (struct g if (flag_mlow_precision_sqrt) flag_mrecip_low_precision_sqrt = true; + -+ /* Disable a restriction that limits unrolling to small loops -+ when there's an explicit -funroll-loops or -funroll-all-loops. */ ++ /* Revert to the traditional behavior of -funroll-loops and -funroll-all-loops ++ if they were explictly specified by the user and the user did not override ++ the implied defaults. */ + if ((opts_set->x_flag_unroll_loops && opts->x_flag_unroll_loops) + || (opts_set->x_flag_unroll_all_loops && opts->x_flag_unroll_all_loops)) + { + if (!opts_set->x_unroll_only_small_loops) + opts->x_unroll_only_small_loops = 0; ++ ++ if (!opts_set->x_flag_rename_registers) ++ opts->x_flag_rename_registers = 1; + } } -- 2.54.0
