On Fri, 21 Mar 2025 at 11:18, Richard Earnshaw (lists)
<richard.earns...@arm.com> wrote:
>
> On 20/03/2025 16:15, Christophe Lyon wrote:
> > Depending on if/how the testing flags are overridden, the first value
> > we try("") might not do what we want.
> >
> > For instance, if the whole testsuite is executed with
> > (A) -mthumb -march=armv7-m -mtune=cortex-m3 -mfloat-abi=softfp
> >
> > bf16_neon_ok is first compiled with
> > (A) (B)
> > where B = -mcpu=unset -march=armv8.2-a+bf16
> >
> > which is accepted, so a testcase like vld2q_lane_bf16_indices_1.c
> > is compiled with:
> > (A) (C) (B)
> > where C = -mfpu=neon -mfloat-abi=softfp -mcpu=unset -march=armv7-a 
> > -mfpu=neon-fp16 -mfp16-format=ieee
> >
> > because advsimd-intrinsics.exp has set additional_flags to (C)
> > via arm_neon_fp16_ok
> >
> > So the testcase is compiled with
> > [...] -mfpu=neon-fp16 -mcpu=unset -march=armv8.2-a+bf16
> > (thus -mfpu=neon-fp16) and bf16 support is disabled.
> >
> > The patch replaces "" with -mfpu=auto which matches the intended
> > effect of -march=armv8.2-a+bf16 as added by bf16_neon_ok, and the
> > testcase is now compiled with
> > (A) (C) -mfpu=auto (B)
> >
> > However, since this effective-target is also used on aarch64 (which
> > does not support -mfpu=auto), we do this only on arm.
> >
> > This patch improves coverage, and makes
> > v{ld,st}[234]q_lane_bf16_indices_1.c pass when testsuite flags are
> > overridden as described above (e.g. for M-profile).
> >
> >       gcc/testsuite/
> >       * lib/target-supports.exp
> >       (check_effective_target_arm_v8_2a_bf16_neon_ok_nocache):
> >       Conditionally use -mfpu=auto.
> > ---
> >  gcc/testsuite/lib/target-supports.exp | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/testsuite/lib/target-supports.exp 
> > b/gcc/testsuite/lib/target-supports.exp
> > index e2622a445c5..09b16a14024 100644
> > --- a/gcc/testsuite/lib/target-supports.exp
> > +++ b/gcc/testsuite/lib/target-supports.exp
> > @@ -6871,12 +6871,19 @@ proc add_options_for_arm_fp16fml_neon { flags } {
> >  proc check_effective_target_arm_v8_2a_bf16_neon_ok_nocache { } {
> >      global et_arm_v8_2a_bf16_neon_flags
> >      set et_arm_v8_2a_bf16_neon_flags ""
> > +    set fpu_auto ""
> >
> >      if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
> >       return 0;
> >      }
> >
> > -    foreach flags {"" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" 
> > "-mfloat-abi=hard -mfpu=neon-fp-armv8" } {
> > +    if { [istarget arm*-*-*] } {
> > +     set fpu_auto "-mfpu=auto"
> > +    }
> > +
> > +    foreach flags [list "$fpu_auto" \
>
> Shouldn't we try first with "", even on Arm?  Thus
>        foreach flags [list "" "$fpu_auto" \
> ...
>
I don't think so, that's why I tried to explain above.
"" is acceptable / accepted in arm_v8_2a_bf16_neon_ok
(this is (A) (B) above, where the important parts are:
-march=armv7-m -mcpu=unset -march=armv8.2-a+bf16
(so -mfpu is set to the toolchain's default)

but then the actual testcase is compiled with additional flags (C)
defined by the test driver using arm_neon_fp16_ok
C = -mfpu=neon -mfloat-abi=softfp -mcpu=unset -march=armv7-a
-mfpu=neon-fp16 -mfp16-format=ieee

so the relevant parts of (A) (C) (B) are:
-march=armv7-m  -mfpu=neon -mcpu=unset -march=armv7-a -mfpu=neon-fp16
-mcpu=unset -march=armv8.2-a+bf16
which can be simplified into
-mfpu=neon-fp16 -march=armv8.2-a+bf16

I think we need to start with -mfpu=auto instead of "", so that when
-march=armv8.2-a+bf16 takes effect, we have cancelled any other -mfpu.



>
> > +                    "-mfloat-abi=softfp -mfpu=neon-fp-armv8" \
> > +                    "-mfloat-abi=hard -mfpu=neon-fp-armv8" ] {
> >       if { [check_no_compiler_messages_nocache arm_v8_2a_bf16_neon_ok 
> > object {
> >           #include <arm_neon.h>
> >           #if !defined (__ARM_FEATURE_BF16_VECTOR_ARITHMETIC)
>
> In fact, since aarch64 doesn't support -mfpu at all, only "" is ever going to 
> work here, so perhaps we can recast all this code, perhaps along the lines of 
> that in check_effective_target_arm_neon_h_ok_nocache so that we don't need to 
> actually try to include arm_neon.h at all while figuring out the options.
>

Maybe.

OTOH, actually including arm_neon.h is guaranteed to give the right
answer to the question "which flags do we need to be able to include
arm_neon.h?", rather than using several conditions that are supposed
to match what arm_neon.h does?

Thanks,

Christophe

> R.

Reply via email to