https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125744

            Bug ID: 125744
           Summary: [arm] -mtune/-mcpu with
                    -fno-fuse-ops-with-volatile-access allows for `add r3,
                    sp, #6` when it should not
           Product: gcc
           Version: 15.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: azoff at gcc dot gnu.org
  Target Milestone: ---

In r16-5947-ga6c50ec2c6ebcb, the flag -ffuse-ops-with-volatile-access was
added. With this new flag, an extra optimization step is introduced to remove
unnecessary registers in a function body.

An example where this matters is for gcc.target/arm/bfloat16_scalar_1_2.c where
the function "stacktest1" produces the following 2 permutations. To simplify
the output, I've removed everything below "stacktest1" function in the input.

First, -mtune=cortex-m0 will influence what instructions that are used and
regardless of -ffuse-ops-with-volatile-access or
-fno-fuse-ops-with-volatile-access, the same assembler is produced.

$ /build/r17-1372-g80b78b2504fba0/bin/arm-none-eabi-gcc   -mthumb
-march=armv6s-m -mtune=cortex-m0 -mfloat-abi=soft -mfpu=auto  
-fdiagnostics-plain-output   -ansi -pedantic-errors -mcpu=unset
-march=armv8.2-a+bf16 -mfpu=auto -mfloat-abi=softfp -O3 -std=gnu90 -S -x c -o -
-fno-ident -ffuse-ops-with-volatile-access  <(sed -ne '1,22p'
/build/gcc_src/gcc/testsuite/gcc.target/arm/bfloat16_scalar_1_
2.c)
        .arch armv8.2-a
        .fpu neon-fp-armv8
        .arch_extension bf16
        .eabi_attribute 20, 1
        .eabi_attribute 21, 1
        .eabi_attribute 23, 3
        .eabi_attribute 24, 1
        .eabi_attribute 25, 1
        .eabi_attribute 26, 1
        .eabi_attribute 30, 2
        .eabi_attribute 34, 1
        .eabi_attribute 18, 4
        .file   "63"
        .text
        .align  1
        .p2align 2,,3
        .global stacktest1
        .syntax unified
        .thumb
        .thumb_func
        .type   stacktest1, %function
stacktest1:
        @ args = 0, pretend = 0, frame = 8
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        sub     sp, sp, #8
        strh    r0, [sp, #6]    @ __bf16
        ldrh    r0, [sp, #6]    @ __bf16
        add     sp, sp, #8
        @ sp needed
        bx      lr
        .size   stacktest1, .-stacktest1



Switching over to -mcpu=cortex-m0 -ffuse-ops-with-volatile-access gives the
same result:

$ /build/r17-1372-g80b78b2504fba0/bin/arm-none-eabi-gcc   -mthumb
-march=armv6s-m -mcpu=cortex-m0 -mfloat-abi=soft -mfpu=auto  
-fdiagnostics-plain-output   -ansi -pedantic-errors -mcpu=unset
-march=armv8.2-a+bf16 -mfpu=auto -mfloat-abi=softfp -O3 -std=gnu90 -S -x c -o -
-fno-ident -ffuse-ops-with-volatile-access  <(sed -ne '1,22p'
/build/gcc_src/gcc/testsuite/gcc.target/arm/bfloat16_scalar_1_2
.c)
        .arch armv8.2-a
        .fpu neon-fp-armv8
        .arch_extension bf16
        .eabi_attribute 20, 1
        .eabi_attribute 21, 1
        .eabi_attribute 23, 3
        .eabi_attribute 24, 1
        .eabi_attribute 25, 1
        .eabi_attribute 26, 1
        .eabi_attribute 30, 2
        .eabi_attribute 34, 1
        .eabi_attribute 18, 4
        .file   "63"
        .text
        .align  1
        .p2align 2,,3
        .global stacktest1
        .syntax unified
        .thumb
        .thumb_func
        .type   stacktest1, %function
stacktest1:
        @ args = 0, pretend = 0, frame = 8
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        sub     sp, sp, #8
        strh    r0, [sp, #6]    @ __bf16
        ldrh    r0, [sp, #6]    @ __bf16
        add     sp, sp, #8
        @ sp needed
        bx      lr
        .size   stacktest1, .-stacktest1


But, using -mcpu=cortex-m0 -ffuse-ops-with-volatile-access gives a different
outcome:

$ /build/r17-1372-g80b78b2504fba0/bin/arm-none-eabi-gcc   -mthumb
-march=armv6s-m -mcpu=cortex-m0 -mfloat-abi=soft -mfpu=auto  
-fdiagnostics-plain-output   -ansi -pedantic-errors -mcpu=unset
-march=armv8.2-a+bf16 -mfpu=auto -mfloat-abi=softfp -O3 -std=gnu90 -S -x c -o -
-fno-ident -fno-fuse-ops-with-volatile-access  <(sed -ne '1,22p'
/build/gcc_src/gcc/testsuite/gcc.target/arm/bfloat16_scalar_
1_2.c)
        .arch armv8.2-a
        .fpu neon-fp-armv8
        .arch_extension bf16
        .eabi_attribute 20, 1
        .eabi_attribute 21, 1
        .eabi_attribute 23, 3
        .eabi_attribute 24, 1
        .eabi_attribute 25, 1
        .eabi_attribute 26, 1
        .eabi_attribute 30, 2
        .eabi_attribute 34, 1
        .eabi_attribute 18, 4
        .file   "63"
        .text
        .align  1
        .p2align 2,,3
        .global stacktest1
        .syntax unified
        .thumb
        .thumb_func
        .type   stacktest1, %function
stacktest1:
        @ args = 0, pretend = 0, frame = 8
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        sub     sp, sp, #8
        add     r3, sp, #6
        strh    r0, [r3]        @ __bf16
        ldrh    r0, [sp, #6]    @ __bf16
        add     sp, sp, #8
        @ sp needed
        bx      lr
        .size   stacktest1, .-stacktest1



All the 4 permutations should, ideally, give the same assembler, the one
without the usage of r3 and an extra addition.

This ticket is derived from the review comments in
https://gcc.gnu.org/pipermail/gcc-patches/2026-June/719922.html.


There are similar issues with the following test cases in trunk as of
r17-1372-g80b78b2504fba0:
    * gcc.target/arm/bfloat16_scalar_1_2.c
    * gcc.target/arm/bfloat16_scalar_2_2.c
    * gcc.target/arm/bfloat16_scalar_3_2.c
    * gcc.target/arm/bfloat16_simd_1_2.c
    * gcc.target/arm/bfloat16_simd_2_2.c
    * gcc.target/arm/bfloat16_simd_3_2.c


In GCC15, the -ffuse-ops-with-volatile-access feature is not implemented (added
in r16-5947-ga6c50ec2c6ebcb), so the produced assembler is likely to contain
the add instruction.

Reply via email to