> On 22 Jul 2024, at 17:16, Richard Sandiford <richard.sandif...@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Kyrylo Tkachov <ktkac...@nvidia.com> writes:
>> Hi Claudio,
>> 
>>> On 22 Jul 2024, at 11:30, Claudio Bantaloukas <claudio.bantalou...@arm.com> 
>>> wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> The ACLE declares several helper types and functions to
>>> facilitate construction of `fpm` arguments.
>>> 
>>> gcc/ChangeLog:
>>> 
>>>       * config/aarch64/arm_acle.h (fpm_t): New type representing fpmr 
>>> values.
>>>       (enum __ARM_FPM_FORMAT): New enum representing valid fp8 formats.
>>>       (enum __ARM_FPM_OVERFLOW): New enum representing how some fp8
>>>       calculations work.
>>>       (arm_fpm_init): New.
>>>       (arm_set_fpm_src1_format): Likewise.
>>>       (arm_set_fpm_src2_format): Likewise.
>>>       (arm_set_fpm_dst_format): Likewise.
>>>       (arm_set_fpm_overflow_cvt): Likewise.
>>>       (arm_set_fpm_overflow_mul): Likewise.
>>>       (arm_set_fpm_lscale): Likewise.
>>>       (arm_set_fpm_lscale2): Likewise.
>>>       (arm_set_fpm_nscale): Likewise.
>> 
>> According to the FP8 pull request in ACLE:
>> https://github.com/ARM-software/acle/pull/323/files
>> These functions should all be in the implementation namespace, that is they 
>> should begin with __arm*.
>> 
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>>       * gcc.target/aarch64/acle/fp8-helpers.c: New test of fpmr helper 
>>> functions.
>>> ---
>>> gcc/config/aarch64/arm_acle.h                 | 37 +++++++++++++
>>> .../gcc.target/aarch64/acle/fp8-helpers.c     | 52 +++++++++++++++++++
>>> 2 files changed, 89 insertions(+)
>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers.c
>>> 
>>> diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
>>> index 2aa681090fa..4f0c600582a 100644
>>> --- a/gcc/config/aarch64/arm_acle.h
>>> +++ b/gcc/config/aarch64/arm_acle.h
>>> @@ -385,6 +385,43 @@ __rndrrs (uint64_t *__res)
>>> 
>>> #pragma GCC pop_options
>>> 
>>> +#ifdef __ARM_FEATURE_FP8
> 
> The definitions should be unconditional, so that they're still available
> if someone initially compiles without fp8 but enables it later in the
> translation unit.
> 
>>> +
>>> +typedef uint64_t fpm_t;
>>> +
>>> +enum __ARM_FPM_FORMAT
>>> +{
>>> +  __ARM_FPM_E5M2,
>>> +  __ARM_FPM_E4M3,
>>> +};
>>> +
>>> +enum __ARM_FPM_OVERFLOW
>>> +{
>>> +  __ARM_FPM_INFNAN,
>>> +  __ARM_FPM_SATURATE,
>>> +};
>>> +
>>> +#define arm_fpm_init() (0)
>>> +
>>> +#define arm_set_fpm_src1_format(__fpm, __format) \
>>> +  ((__fpm & ~(uint64_t)0x7) | (__format & (uint64_t)0x7))
>>> +#define arm_set_fpm_src2_format(__fpm, __format) \
>>> +  ((__fpm & ~((uint64_t)0x7 << 3)) | ((__format & (uint64_t)0x7) << 3))
>>> +#define arm_set_fpm_dst_format(__fpm, __format) \
>>> +  ((__fpm & ~((uint64_t)0x7 << 6)) | ((__format & (uint64_t)0x7) << 6))
>>> +#define arm_set_fpm_overflow_cvt(__fpm, __behaviour) \
>>> +  ((__fpm & ~((uint64_t)0x1 << 15)) | ((__behaviour & (uint64_t)0x1) << 
>>> 15))
>>> +#define arm_set_fpm_overflow_mul(__fpm, __behaviour) \
>>> +  ((__fpm & ~((uint64_t)0x1 << 14)) | ((__behaviour & (uint64_t)0x1) << 
>>> 14))
> 
> Nit: should use the international (= US) spelling "behavior".
> 
>>> +#define arm_set_fpm_lscale(__fpm, __scale) \
>>> +  ((__fpm & ~((uint64_t)0x7f << 16)) | ((__scale & (uint64_t)0x7f) << 16))
>>> +#define arm_set_fpm_lscale2(__fpm, __scale) \
>>> +  ((__fpm & ~((uint64_t)0x3f << 32)) | ((__scale & (uint64_t)0x3f) << 32))
>>> +#define arm_set_fpm_nscale(__fpm, __scale) \
>>> +  ((__fpm & ~((uint64_t)0xff << 24)) | ((__scale & (uint64_t)0xff) << 24))
>>> +
>> 
>> I’d feel more comfortable if these were implemented as proper intrinsics in 
>> aarch64-builtins.cc <http://aarch64-builtins.cc/> rather than open-coding 
>> them like that. For one it would guarantee tighter control over the code 
>> generation.
>> This version would likely fluctuate a lot based on optimization level, 
>> evolution of unrelated passes in the mined and other things.
>> Additionally, the ACLE mandates certain ranges for the scale parameter in 
>> these intrinsics. Having them as builtins would allow us to add validation 
>> code to help the user here.
> 
> These helpers don't map to specific FP8 instructions though.
> They're just an abstraction and an aid to the programmer, so that people
> don't have to write hard-coded masks or their own shift/insert code.
> There's no expectation that a given macro/function will produce a given
> code sequence.
> 
> Also, the parameters are allowed to be variable, so I don't think the
> ranges are something that the compiler should enforce directly.

I see, I guess it makes sense then, I withdraw my objection :)
Thanks,
Kyrill

> 
> Thanks,
> Richard
> 
>> 
>>> +#endif
>>> +
>>> #ifdef __cplusplus
>>> }
>>> #endif
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers.c 
>>> b/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers.c
>>> new file mode 100644
>>> index 00000000000..e654326c387
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers.c
>>> @@ -0,0 +1,52 @@
>>> +/* Test the fp8 ACLE helper functions.  */
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-std=c90 -pedantic-errors -O1 -march=armv9.4-a+fp8" } */
>>> +
>>> +#include <arm_acle.h>
>>> +
>>> +void
>>> +test_prepare_fpmr_sysreg ()
>>> +{
>>> +
>>> +#define _S_EQ(expr, expected)                                              
>>>     \
>>> +  _Static_assert (expr == expected, #expr " == " #expected)
>>> +
>>> +  _S_EQ (arm_fpm_init (), 0);
>>> +
>>> +  /* Bits [2:0] */
>>> +  _S_EQ (arm_set_fpm_src1_format (arm_fpm_init (), __ARM_FPM_E5M2), 0);
>>> +  _S_EQ (arm_set_fpm_src1_format (arm_fpm_init (), __ARM_FPM_E4M3), 0x1);
>>> +
>>> +  /* Bits [5:3] */
>>> +  _S_EQ (arm_set_fpm_src2_format (arm_fpm_init (), __ARM_FPM_E5M2), 0);
>>> +  _S_EQ (arm_set_fpm_src2_format (arm_fpm_init (), __ARM_FPM_E4M3), 0x8);
>>> +
>>> +  /* Bits [8:6] */
>>> +  _S_EQ (arm_set_fpm_dst_format (arm_fpm_init (), __ARM_FPM_E5M2), 0);
>>> +  _S_EQ (arm_set_fpm_dst_format (arm_fpm_init (), __ARM_FPM_E4M3), 0x40);
>>> +
>>> +  /* Bit 14 */
>>> +  _S_EQ (arm_set_fpm_overflow_mul (arm_fpm_init (), __ARM_FPM_INFNAN), 0);
>>> +  _S_EQ (arm_set_fpm_overflow_mul (arm_fpm_init (), __ARM_FPM_SATURATE),
>>> + 0x4000);
>>> +
>>> +  /* Bit 15 */
>>> +  _S_EQ (arm_set_fpm_overflow_cvt (arm_fpm_init (), __ARM_FPM_INFNAN), 0);
>>> +  _S_EQ (arm_set_fpm_overflow_cvt (arm_fpm_init (), __ARM_FPM_SATURATE),
>>> + 0x8000);
>>> +
>>> +  /* Bits [22:16] */
>>> +  _S_EQ (arm_set_fpm_lscale (arm_fpm_init (), 0), 0);
>>> +  _S_EQ (arm_set_fpm_lscale (arm_fpm_init (), 127), 0x7F0000);
>>> +
>>> +  /* Bits [37:32] */
>>> +  _S_EQ (arm_set_fpm_lscale2 (arm_fpm_init (), 0), 0);
>>> +  _S_EQ (arm_set_fpm_lscale2 (arm_fpm_init (), 63), 0x3F00000000);
>>> +
>>> +  /* Bits [31:24] */
>>> +  _S_EQ (arm_set_fpm_nscale (arm_fpm_init (), 0), 0);
>>> +  _S_EQ (arm_set_fpm_nscale (arm_fpm_init (), 127), 0x7F000000);
>>> +  _S_EQ (arm_set_fpm_nscale (arm_fpm_init (), -128), 0x80000000);
>>> +
>> 
>> Given my comment above, I’d expect to see some code generation tests as well 
>> to check that the FPM modification functions above generate tight code.
>> 
>> Thanks,
>> Kyrill
>> 
>> 
>>> +#undef _S_EQ
>>> +}

Reply via email to