Hi gcc-patches mailing list, Christopher Bazley via Sourceware Forge <[email protected]> has requested that the following forgejo pull request be published on the mailing list.
Created on: 2026-05-11 16:58:20+00:00 Latest update: 2026-05-11 17:02:27+00:00 Changes: 41 changed files, 130 additions, 3 deletions Head revision: chris.bazley/gcc ref aarch64_unroll_small_loops_by_default2 commit a4009d108bc44ea3abeec87c1c681f787440cb22 Base revision: gcc/gcc-TEST ref trunk commit 2b82af511b4c77d4b4387e3b01e5fe18847c3a8e r17-444-g2b82af511b4c77 Merge base: 2b82af511b4c77d4b4387e3b01e5fe18847c3a8e Full diff url: https://forge.sourceware.org/gcc/gcc-TEST/pulls/155.diff Discussion: https://forge.sourceware.org/gcc/gcc-TEST/pulls/155 Requested Reviewers: Benchmarking has shown some worthwhile improvements as a result of unrolling small loops (<= 4 instructions) by the smallest possible unroll factor (2) in the RTL loop unroller pass. In preparation for enabling loop unrolling for at least some AArch64 targets, this commit implements the TARGET_LOOP_UNROLL_ADJUST hook with a similar structure to the implementation of the equivalent function in other backends. Support is also added for an undocumented command line option, -munroll-only-small-loops, that is already supported by several other backends. In the short term, the only effect of this option on the AArch64 backend will be to suppress loop unrolling: a max_small_unroll_ninsns member added to the tuning parameters structure is initialized to zero for every type of CPU, which means that no loop is considered small enough to unroll. The combination of -funroll-loops and -munroll-only-small-loops is now enabled by default at optimization level -O2 and above, if debugging and optimization for code size are disabled. Restricting loop unrolling to small loops by default at higher optimization levels is already the policy of several other backends. Unlike the i386 backend, -frename-registers and -fweb (implicitly enabled by -funroll-loops) are not disabled by default at all optimization levels; nor is their value overridden by any of the hooks TARGET_OPTION_OVERRIDE, TARGET_OPTION_RESTORE, TARGET_SET_CURRENT_FUNCTION or TARGET_OPTION_VALID_ATTRIBUTE_P. The reason is that one benchmark showed a significant performance improvement with those options enabled. These preparatory changes are not expected to cause loop unrolling to be implicitly enabled for existing programs because no loop of more than zero insns is considered small enough to unroll; nor do they prevent loop unrolling when explicitly requested by a user who specifies -funroll-loops or -funroll-all-loops when invoking GCC. It will still be possible for programmers to explicitly request loop unrolling by a user-specified loop-specific factor at any optimization level, by putting '#pragma GCC unroll' in the source code. If loop unrolling is explicitly enabled at a lower optimization level then it can now be suppressed by invoking GCC with the new -munroll-only-small-loops option (e.g., -O1 -funroll-loops -munroll-only-small-loops); however, that combination of options is not guaranteed to disable loop unrolling in the long term, because it depends on the tuning parameters for the selected target. gcc/ChangeLog: * common/config/aarch64/aarch64-common.cc: Enable -munroll-only-small-loops with -funroll-loops in aarch_option_optimization_table to unroll small loops at -O2 and above by default, unless debugging or optimization for size is enabled. * config/aarch64/aarch64-json-schema.h: Add max_small_unroll_ninsns and max_small_unroll_factor. * config/aarch64/aarch64-json-tunings-parser-generated.inc: Regenerate. * config/aarch64/aarch64-json-tunings-printer-generated.inc: Regenerate. * config/aarch64/aarch64-protos.h (struct tune_params): New members: max_small_unroll_ninsns and max_small_unroll_factor to configure which loops are considered 'small' and the maximum amount by which to allow such loops to be unrolled. * config/aarch64/aarch64.cc (aarch64_override_options_after_change_1): Disable the restriction whereby only 'small' loops are unrolled if the user explicitly specified -funroll-loops or -funroll-all-loops. (aarch64_override_options_internal): Pass opts_set through to aarch64_override_options_after_change_1 as a new argument. (aarch64_override_options_after_change): Pass &global_options_set as a new argument to aarch64_override_options_after_change_1. (aarch64_loop_unroll_adjust): Implement the target hook TARGET_LOOP_UNROLL_ADJUST. (TARGET_LOOP_UNROLL_ADJUST): Define macro as aarch64_loop_unroll_adjust to enable target hook. * config/aarch64/aarch64.opt: Add -munroll-only-small-loops as a new undocumented option. * config/aarch64/tuning_models/a64fx.h: Initialize max_small_unroll_ninsns and max_small_unroll_factor to zero so that no loops are considered small enough to unroll in this tuning model. * config/aarch64/tuning_models/ampere1.h: As above. * config/aarch64/tuning_models/ampere1a.h: As above. * config/aarch64/tuning_models/ampere1b.h: As above. * config/aarch64/tuning_models/cortexa35.h: As above. * config/aarch64/tuning_models/cortexa53.h: As above. * config/aarch64/tuning_models/cortexa57.h: As above. * config/aarch64/tuning_models/cortexa72.h: As above. * config/aarch64/tuning_models/cortexa73.h: As above. * config/aarch64/tuning_models/cortexx925.h: As above. * config/aarch64/tuning_models/emag.h: As above. * config/aarch64/tuning_models/exynosm1.h: As above. * config/aarch64/tuning_models/fujitsu_monaka.h: As above. * config/aarch64/tuning_models/generic.h: As above. * config/aarch64/tuning_models/generic_armv8_a.h: As above. * config/aarch64/tuning_models/generic_armv9_a.h: As above. * config/aarch64/tuning_models/hip12.h: As above. * config/aarch64/tuning_models/neoverse512tvb.h: As above. * config/aarch64/tuning_models/neoversen1.h: As above. * config/aarch64/tuning_models/neoversen2.h: As above. * config/aarch64/tuning_models/neoversen3.h: As above. * config/aarch64/tuning_models/neoversev1.h: As above. * config/aarch64/tuning_models/neoversev2.h: As above. * config/aarch64/tuning_models/neoversev3.h: As above. * config/aarch64/tuning_models/neoversev3ae.h: As above. * config/aarch64/tuning_models/olympus.h: As above. * config/aarch64/tuning_models/qdf24xx.h: As above. * config/aarch64/tuning_models/saphira.h: As above. * config/aarch64/tuning_models/thunderx.h: As above. * config/aarch64/tuning_models/thunderx2t99.h: As above. * config/aarch64/tuning_models/thunderx3t110.h: As above. * config/aarch64/tuning_models/thunderxt88.h: As above. * config/aarch64/tuning_models/tsv110.h: As above. * config/aarch64/tuning_models/xgene1.h: As above. Changed files: - M: gcc/common/config/aarch64/aarch64-common.cc - M: gcc/config/aarch64/aarch64-json-schema.h - M: gcc/config/aarch64/aarch64-json-tunings-parser-generated.inc - M: gcc/config/aarch64/aarch64-json-tunings-printer-generated.inc - M: gcc/config/aarch64/aarch64-protos.h - M: gcc/config/aarch64/aarch64.cc - M: gcc/config/aarch64/aarch64.opt - M: gcc/config/aarch64/tuning_models/a64fx.h - M: gcc/config/aarch64/tuning_models/ampere1.h - M: gcc/config/aarch64/tuning_models/ampere1a.h - M: gcc/config/aarch64/tuning_models/ampere1b.h - M: gcc/config/aarch64/tuning_models/cortexa35.h - M: gcc/config/aarch64/tuning_models/cortexa53.h - M: gcc/config/aarch64/tuning_models/cortexa57.h - M: gcc/config/aarch64/tuning_models/cortexa72.h - M: gcc/config/aarch64/tuning_models/cortexa73.h - M: gcc/config/aarch64/tuning_models/cortexx925.h - M: gcc/config/aarch64/tuning_models/emag.h - M: gcc/config/aarch64/tuning_models/exynosm1.h - M: gcc/config/aarch64/tuning_models/fujitsu_monaka.h - M: gcc/config/aarch64/tuning_models/generic.h - M: gcc/config/aarch64/tuning_models/generic_armv8_a.h - M: gcc/config/aarch64/tuning_models/generic_armv9_a.h - M: gcc/config/aarch64/tuning_models/hip12.h - M: gcc/config/aarch64/tuning_models/neoverse512tvb.h - M: gcc/config/aarch64/tuning_models/neoversen1.h - M: gcc/config/aarch64/tuning_models/neoversen2.h - M: gcc/config/aarch64/tuning_models/neoversen3.h - M: gcc/config/aarch64/tuning_models/neoversev1.h - M: gcc/config/aarch64/tuning_models/neoversev2.h - M: gcc/config/aarch64/tuning_models/neoversev3.h - M: gcc/config/aarch64/tuning_models/neoversev3ae.h - M: gcc/config/aarch64/tuning_models/olympus.h - M: gcc/config/aarch64/tuning_models/qdf24xx.h - M: gcc/config/aarch64/tuning_models/saphira.h - M: gcc/config/aarch64/tuning_models/thunderx.h - M: gcc/config/aarch64/tuning_models/thunderx2t99.h - M: gcc/config/aarch64/tuning_models/thunderx3t110.h - M: gcc/config/aarch64/tuning_models/thunderxt88.h - M: gcc/config/aarch64/tuning_models/tsv110.h - M: gcc/config/aarch64/tuning_models/xgene1.h Christopher Bazley (1): aarch64: Restrict unrolling to small loops by default gcc/common/config/aarch64/aarch64-common.cc | 6 +++ gcc/config/aarch64/aarch64-json-schema.h | 2 + .../aarch64-json-tunings-parser-generated.inc | 2 + ...aarch64-json-tunings-printer-generated.inc | 2 + gcc/config/aarch64/aarch64-protos.h | 7 ++++ gcc/config/aarch64/aarch64.cc | 42 +++++++++++++++++-- gcc/config/aarch64/aarch64.opt | 4 ++ gcc/config/aarch64/tuning_models/a64fx.h | 2 + gcc/config/aarch64/tuning_models/ampere1.h | 2 + gcc/config/aarch64/tuning_models/ampere1a.h | 2 + gcc/config/aarch64/tuning_models/ampere1b.h | 2 + gcc/config/aarch64/tuning_models/cortexa35.h | 2 + gcc/config/aarch64/tuning_models/cortexa53.h | 2 + gcc/config/aarch64/tuning_models/cortexa57.h | 2 + gcc/config/aarch64/tuning_models/cortexa72.h | 2 + gcc/config/aarch64/tuning_models/cortexa73.h | 2 + gcc/config/aarch64/tuning_models/cortexx925.h | 2 + gcc/config/aarch64/tuning_models/emag.h | 2 + gcc/config/aarch64/tuning_models/exynosm1.h | 2 + .../aarch64/tuning_models/fujitsu_monaka.h | 2 + gcc/config/aarch64/tuning_models/generic.h | 2 + .../aarch64/tuning_models/generic_armv8_a.h | 2 + .../aarch64/tuning_models/generic_armv9_a.h | 2 + gcc/config/aarch64/tuning_models/hip12.h | 2 + .../aarch64/tuning_models/neoverse512tvb.h | 2 + gcc/config/aarch64/tuning_models/neoversen1.h | 2 + gcc/config/aarch64/tuning_models/neoversen2.h | 2 + gcc/config/aarch64/tuning_models/neoversen3.h | 2 + gcc/config/aarch64/tuning_models/neoversev1.h | 2 + gcc/config/aarch64/tuning_models/neoversev2.h | 2 + gcc/config/aarch64/tuning_models/neoversev3.h | 2 + .../aarch64/tuning_models/neoversev3ae.h | 2 + gcc/config/aarch64/tuning_models/olympus.h | 2 + gcc/config/aarch64/tuning_models/qdf24xx.h | 2 + gcc/config/aarch64/tuning_models/saphira.h | 2 + gcc/config/aarch64/tuning_models/thunderx.h | 2 + .../aarch64/tuning_models/thunderx2t99.h | 2 + .../aarch64/tuning_models/thunderx3t110.h | 2 + .../aarch64/tuning_models/thunderxt88.h | 2 + gcc/config/aarch64/tuning_models/tsv110.h | 2 + gcc/config/aarch64/tuning_models/xgene1.h | 2 + 41 files changed, 130 insertions(+), 3 deletions(-) -- 2.54.0
