*   Clang for AArch64 promotes each individual operation and rounds 
immediately afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between 
the two fadd operations. It's implemented in the LLVM backend where we can't 
see what was originally a single expression.

Yes, but this is not consistent with Clang document. I think we should ask 
Clang FE to do the promotion and truncation.

Thanks
Pengfei

From: llvm-dev <llvm-dev-boun...@lists.llvm.org> On Behalf Of Craig Topper via 
llvm-dev
Sent: Wednesday, July 14, 2021 11:32 PM
To: Hongtao Liu <crazy...@gmail.com>
Cc: Jakub Jelinek <ja...@redhat.com>; llvm-dev <llvm-...@lists.llvm.org>; Liu, 
Hongtao <hongtao....@intel.com>; gcc-patches@gcc.gnu.org; Joseph Myers 
<jos...@codesourcery.com>
Subject: Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16

On Wed, Jul 14, 2021 at 12:45 AM Hongtao Liu via llvm-dev 
<llvm-...@lists.llvm.org<mailto:llvm-...@lists.llvm.org>> wrote:
> >
> Set excess_precision_type to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 to
> round after each operation could keep semantics right.
> And I'll document the behavior difference between soft-fp and
> AVX512FP16 instruction for exceptions.
I got some feedback from my colleague who's working on supporting
_Float16 for llvm.
The LLVM side wants to set  FLT_EVAL_METHOD_PROMOTE_TO_FLOAT for
soft-fp so that codes can be more efficient.
i.e.
_Float16 a, b, c, d;
d = a + b + c;

would be transformed to
float tmp, tmp1, a1, b1, c1;
a1 = (float) a;
b1 = (float) b;
c1 = (float) c;
tmp = a1 + b1;
tmp1 = tmp + c1;
d = (_Float16) tmp;

so there's only 1 truncation in the end.

if users want to round back after every operation. codes should be
explicitly written as
_Float16 a, b, c, d, e;
e = a + b;
d = e + c;

That's what Clang does, quote from [1]
 _Float16 arithmetic will be performed using native half-precision
support when available on the target (e.g. on ARMv8.2a); otherwise it
will be performed at a higher precision (currently always float) and
then truncated down to _Float16. Note that C and C++ allow
intermediate floating-point operands of an expression to be computed
with greater precision than is expressible in their type, so Clang may
avoid intermediate truncations in certain cases; this may lead to
results that are inconsistent with native arithmetic.

Clang for AArch64 promotes each individual operation and rounds immediately 
afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two fadd 
operations. It's implemented in the LLVM backend where we can't see what was 
originally a single expression.


and so does arm gcc
quote from arm.c

/* We can calculate either in 16-bit range and precision or
   32-bit range and precision.  Make that decision based on whether
   we have native support for the ARMv8.2-A 16-bit floating-point
   instructions or not.  */
return (TARGET_VFP_FP16INST
? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
: FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);


[1]https://clang.llvm.org/docs/LanguageExtensions.html
> > --
> > Joseph S. Myers
> > jos...@codesourcery.com<mailto:jos...@codesourcery.com>
>
>
>
> --
> BR,
> Hongtao



--
BR,
Hongtao
_______________________________________________
LLVM Developers mailing list
llvm-...@lists.llvm.org<mailto:llvm-...@lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Reply via email to