On 2026-02-06 12:21, Richard Earnshaw (foss) wrote:
On 06/02/2026 10:11, Torbjorn SVENSSON wrote:
On 2026-01-27 11:03, Richard Earnshaw wrote:
On 27/01/2026 06:17, Alexandre Oliva wrote:
On Jan 23, 2026, "Richard Earnshaw (foss)" <[email protected]> wrote:
On 19/01/2026 19:23, Alexandre Oliva wrote:
-/* { dg-additional-options "-march=armv7-a -mthumb" { target { arm_arch_v7a_ok
&& arm_thumb2_ok } } } */
+/* { dg-additional-options "-mcpu=unset -march=armv7-a -mthumb" { target {
arm_arch_v7a_ok && arm_thumb2_ok } } } */
This will fail if other options set, or config settings imply,
-mfloat-abi=hard and -mfpu=auto.
So we should use -march=armv7-a+fp
Oh, good catch, thanks.
Here's the patch with this fix, currently under retesting.
I'll take your response above as approval with changes.
Reset the cpu selection to the default on tests that set -march
explicitly instead of using dg-add-options. The latter would reset
the cpu selection to avoid interference from TOOL_OPTIONS.
Also add +fp to -march in tests that don't override float-abi and fpu,
so that -mfloat-abi=hard -mfpu=auto in TOOL_OPTIONS won't cause a
failure.
for gcc/testsuite/ChangeLog
* gcc.target/arm/bfloat16_simd_1_2.c: Add -mcpu=unset.
* gcc.target/arm/bfloat16_simd_2_2.c: Likewise.
* gcc.target/arm/bfloat16_simd_3_2.c: Likewise.
* gcc.dg/torture/pr120347.c: Likewise. Add +fp to -march.
This is OK, thanks.
Can this patch be picked for release/gcc-15 too?
Yes, as long as you've tested it properly.
I've built r15-10798-gae573c9d0e7f1c and ran the testsuite with the following
combinations of flags:
thumb/arch=armv6s-m/cpu=cortex-m0/float-abi=soft
thumb/arch=armv6s-m/tune=cortex-m0/float-abi=soft/fpu=auto
thumb/arch=armv7-m/cpu=cortex-m3/float-abi=soft
thumb/arch=armv7-m/tune=cortex-m3/float-abi=soft/fpu=auto
thumb/arch=armv7e-m+fp.dp/tune=cortex-m7/float-abi=hard/fpu=auto
thumb/arch=armv7e-m+fp/tune=cortex-m4/float-abi=hard/fpu=auto
thumb/arch=armv7e-m+nofp/tune=cortex-m4/float-abi=soft/fpu=auto
thumb/arch=armv7e-m+nofp/tune=cortex-m7/float-abi=soft/fpu=auto
thumb/arch=armv7e-m/cpu=cortex-m4/float-abi=hard/fpu=fpv4-sp-d16
thumb/arch=armv7e-m/cpu=cortex-m4/float-abi=soft
thumb/arch=armv7e-m/cpu=cortex-m7/float-abi=hard/fpu=fpv5-d16
thumb/arch=armv7e-m/cpu=cortex-m7/float-abi=soft
thumb/arch=armv7ve+neon/tune=cortex-a7/float-abi=hard/fpu=auto
thumb/arch=armv7ve+nofp/tune=cortex-a7/float-abi=soft/fpu=auto
thumb/arch=armv7ve/cpu=cortex-a7/float-abi=hard/fpu=neon
thumb/arch=armv7ve/cpu=cortex-a7/float-abi=soft
thumb/arch=armv8-m.main+dsp+fp/tune=cortex-m33/float-abi=hard/fpu=auto
thumb/arch=armv8-m.main+dsp+nofp/tune=cortex-m33/float-abi=soft/fpu=auto
thumb/arch=armv8-m.main+dsp/cpu=cortex-m33/float-abi=hard/fpu=fpv5-sp-d16
thumb/arch=armv8-m.main+dsp/cpu=cortex-m33/float-abi=soft
thumb/arch=armv8.1-m.main+mve+nofp/tune=cortex-m55/float-abi=soft/fpu=auto
thumb/arch=armv8.1-m.main+mve+pacbti+nofp/tune=cortex-m85/float-abi=soft/fpu=auto
thumb/arch=armv8.1-m.main+mve+pacbti/cpu=cortex-m85/float-abi=hard/fpu=fpv5-d16
thumb/arch=armv8.1-m.main+mve+pacbti/cpu=cortex-m85/float-abi=soft
thumb/arch=armv8.1-m.main+mve.fp+fp.dp/tune=cortex-m55/float-abi=hard/fpu=auto
thumb/arch=armv8.1-m.main+mve.fp+pacbti+fp.dp/tune=cortex-m85/float-abi=hard/fpu=auto
thumb/arch=armv8.1-m.main+mve/cpu=cortex-m55/float-abi=hard/fpu=fpv5-d16
thumb/arch=armv8.1-m.main+mve/cpu=cortex-m55/float-abi=soft
Out of these, the following permutations have the 4 test cases going from FAIL
to PASS with the patch applied:
thumb/arch=armv7-m/cpu=cortex-m3/float-abi=soft
thumb/arch=armv7e-m+fp.dp/tune=cortex-m7/float-abi=hard/fpu=auto
thumb/arch=armv7e-m+fp/tune=cortex-m4/float-abi=hard/fpu=auto
thumb/arch=armv7e-m/cpu=cortex-m4/float-abi=hard/fpu=fpv4-sp-d16
thumb/arch=armv7e-m/cpu=cortex-m4/float-abi=soft
thumb/arch=armv7e-m/cpu=cortex-m7/float-abi=hard/fpu=fpv5-d16
thumb/arch=armv7e-m/cpu=cortex-m7/float-abi=soft
thumb/arch=armv7ve+neon/tune=cortex-a7/float-abi=hard/fpu=auto
thumb/arch=armv7ve/cpu=cortex-a7/float-abi=hard/fpu=neon
thumb/arch=armv7ve/cpu=cortex-a7/float-abi=soft
thumb/arch=armv8-m.main+dsp+fp/tune=cortex-m33/float-abi=hard/fpu=auto
thumb/arch=armv8-m.main+dsp/cpu=cortex-m33/float-abi=hard/fpu=fpv5-sp-d16
thumb/arch=armv8-m.main+dsp/cpu=cortex-m33/float-abi=soft
thumb/arch=armv8.1-m.main+mve+pacbti/cpu=cortex-m85/float-abi=hard/fpu=fpv5-d16
thumb/arch=armv8.1-m.main+mve+pacbti/cpu=cortex-m85/float-abi=soft
thumb/arch=armv8.1-m.main+mve.fp+fp.dp/tune=cortex-m55/float-abi=hard/fpu=auto
thumb/arch=armv8.1-m.main+mve.fp+pacbti+fp.dp/tune=cortex-m85/float-abi=hard/fpu=auto
thumb/arch=armv8.1-m.main+mve/cpu=cortex-m55/float-abi=hard/fpu=fpv5-d16
thumb/arch=armv8.1-m.main+mve/cpu=cortex-m55/float-abi=soft
There are no test cases that regresses.
However, the following permutations still fails `check-function-bodies
stacktest1` of bfloat16_simd_[123]_2.c, even after applying the patch (below
assembler is for bfloat16_simd_1_2.c, but the other two bfloat16 tests have
similar output):
thumb/arch=armv7-m/tune=cortex-m3/float-abi=soft/fpu=auto
sub sp, sp, #8
strh r0, [sp, #6] @ __bf16
ldrh r0, [sp, #6] @ __bf16
add r3, sp, #6
add sp, sp, #8
bx lr
thumb/arch=armv7e-m+fp.dp/tune=cortex-m7/float-abi=hard/fpu=auto
sub sp, sp, #8
strh r0, [sp, #6] @ __bf16
add r3, sp, #6
ldrh r0, [sp, #6] @ __bf16
add sp, sp, #8
bx lr
thumb/arch=armv7e-m+nofp/tune=cortex-m7/float-abi=soft/fpu=auto
sub sp, sp, #8
strh r0, [sp, #6] @ __bf16
add r3, sp, #6
ldrh r0, [sp, #6] @ __bf16
add sp, sp, #8
bx lr
thumb/arch=armv8-m.main+dsp+fp/tune=cortex-m33/float-abi=hard/fpu=auto
sub sp, sp, #8
strh r0, [sp, #6] @ __bf16
ldrh r0, [sp, #6] @ __bf16
add r3, sp, #6
add sp, sp, #8
bx lr
thumb/arch=armv8-m.main+dsp+nofp/tune=cortex-m33/float-abi=soft/fpu=auto
sub sp, sp, #8
strh r0, [sp, #6] @ __bf16
ldrh r0, [sp, #6] @ __bf16
add r3, sp, #6
add sp, sp, #8
bx lr
thumb/arch=armv8.1-m.main+mve+nofp/tune=cortex-m55/float-abi=soft/fpu=auto
sub sp, sp, #8
strh r0, [sp, #6] @ __bf16
ldrh r0, [sp, #6] @ __bf16
add r3, sp, #6
add sp, sp, #8
bx lr
thumb/arch=armv8.1-m.main+mve+pacbti+nofp/tune=cortex-m85/float-abi=soft/fpu=auto
sub sp, sp, #8
strh r0, [sp, #6] @ __bf16
ldrh r0, [sp, #6] @ __bf16
add r3, sp, #6
add sp, sp, #8
bx lr
thumb/arch=armv8.1-m.main+mve.fp+fp.dp/tune=cortex-m55/float-abi=hard/fpu=auto
sub sp, sp, #8
strh r0, [sp, #6] @ __bf16
ldrh r0, [sp, #6] @ __bf16
add r3, sp, #6
add sp, sp, #8
bx lr
thumb/arch=armv8.1-m.main+mve.fp+pacbti+fp.dp/tune=cortex-m85/float-abi=hard/fpu=auto
sub sp, sp, #8
strh r0, [sp, #6] @ __bf16
ldrh r0, [sp, #6] @ __bf16
add r3, sp, #6
add sp, sp, #8
bx lr
I've also checked r16-6992-gd7e5113e592c54 and it contains similar patterns as
above for the failing tests.
The function body should match:
/*
**stacktest1:
** ...
** strh r[0-9]+, \[r[0-9]+\] @ __bf16
** ldrh r[0-9]+, \[sp, #[0-9]+\] @ __bf16
** ...
** bx lr
*/
Is it okay to use the stack for both of the strh and ldrh, or is this a real
bug in the compiler?
Unless someone has objected before the end of the week, I'll do the cherry-pick
for releases/gcc-15 to at least run the tests with correct flags.
Kind regards,
Torbjörn
R.