On Tue, Aug 13, 2024 at 11:14:47AM +0800, Xiao Zeng wrote:
> Thank you very much for the in-depth discussion between Jakub Jelinek and 
> jeff.
> My knowledge is narrow, and I am not familiar with architectures other than 
> RISCV.
> At the same time, my understanding of libraries such as libc and libm is also 
> shallow.
> 
> I spent some time sorting out my thoughts, which resulted in slow email 
> replies. I am very sorry.

The important thing is that the current state of BF16 support on other
architectures is what we want there, not more.  So any changes done for
RISCV shouldn't affect the other architectures, that wasn't the case of
the patch you've posted.
E.g. on x86_64, for FP16 we have:
__divhc3@@GCC_12.0.0
__eqhf2@@GCC_12.0.0
__extendhfdf2@@GCC_12.0.0
__extendhfsf2@@GCC_12.0.0
__extendhftf2@@GCC_12.0.0
__extendhfxf2@@GCC_12.0.0
__fixhfti@@GCC_12.0.0
__fixunshfti@@GCC_12.0.0
__floatbitinthf@@GCC_14.0.0
__floattihf@@GCC_12.0.0
__floatuntihf@@GCC_12.0.0
__mulhc3@@GCC_12.0.0
__nehf2@@GCC_12.0.0
__truncdfhf2@@GCC_12.0.0
__trunchfbf2@@GCC_13.0.0
__truncsfhf2@@GCC_12.0.0
__trunctfhf2@@GCC_12.0.0
__truncxfhf2@@GCC_12.0.0
exported from libgcc, while for BF16 just:
__extendbfsf2@@GCC_13.0.0
__floatbitintbf@@GCC_14.0.0
__floattibf@@GCC_13.0.0
__floatuntibf@@GCC_13.0.0
__truncdfbf2@@GCC_13.0.0
__trunchfbf2@@GCC_13.0.0
__truncsfbf2@@GCC_13.0.0
__trunctfbf2@@GCC_13.0.0
__truncxfbf2@@GCC_13.0.0
More attention has been paid to what we actually need there, which is
primarily conversions to/from other types (but even not to all of them, with
some changes on the RTL expression lowering side to make sure we use the
SFmode arithmetics as much as possible and only have the really required
stuff on the libgcc side.
We don't want to change that, if you really need __mulbc3/__divbc3 on RISCV,
then it should be added for that arch only.  And similarly, the choice
of the builtins on the compiler side, the two builtins we have right now is
all we want on the other arches.  So, further builtins would be either a
matter of RISCV specific builtins, or in generic code but guarded by some
target hook so that they aren't enabled on arches which don't want them.
On the libstdc++ side, the current headers provide for std::bfloat16_t and
std::float16_t an implementation which uses SFmode calculations where
possible, so stuff like:
  constexpr _Float16
  acos(_Float16 __x)
  { return _Float16(__builtin_acosf(__x)); }
or
  constexpr __gnu_cxx::__bfloat16_t
  acos(__gnu_cxx::__bfloat16_t __x)
  { return __gnu_cxx::__bfloat16_t(__builtin_acosf(__x)); }
And for printing, note there is
_ZSt20__to_chars_float16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31
_ZSt21__to_chars_bfloat16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31
_ZSt22__from_chars_float16_tPKcS0_RfSt12chars_format@@GLIBCXX_3.4.31
_ZSt23__from_chars_bfloat16_tPKcS0_RfSt12chars_format@@GLIBCXX_3.4.31
which input and output _Float16 and __bf16, but in the parameter passing
they expect those types to be promoted to float, so that the ABIs aren't
dependent on when a particular arch enables those types.

For RISCV, the things to consider are, what is the _Float16 and __bf16
function argument passing/returning ABI?  Is the type enabled on all
variants of RISCV, or just some (e.g. regarding _Float16 and __bf16
on i686-linux, there is support for it only if the SSE2 ISA is available,
so e.g. the *[hb][fc]* functions in libgcc need to be compiled with
-msse2 extra flag)?  If it can be passed/returned the same in all ABIs,
what excess precision mode do you want to use on them?  I mean e.g. the
TARGET_C_EXCESS_PRECISION target hook.  On e.g. x86_64, the default
is to promote all _Float16 and __bf16 calculations to float, so if you have
__bf16 a, b, c, d, e;
...
a = b * c + d - e + c * d;
all variables are converted to SFmode temporaries and all the arithmetics
is done in SFmode and only then at the end finally converted to HFmode
or BFmode.  One can request a different mode, -fexcess-precision=16
in which such promotion isn't done, but as there is no hw support for
most of the operations, the actual multiplication, addition or subtraction
is still done in SFmode, just there is a conversion to BFmode after each
operation (so slower, but more precise).
If you still want to export __divbc3 and __mulbc3, do you want to export
those just on some RISCV ABI variants or all of them?  Depending on that,
arrange for those to be compiled just for those; and, if it is exported
from libgcc_s.so.1, you also need to add a symbol version for those, likely
GCC_15.0.0.

For enabling just those 2 functions, I don't think you need any changes on
the builtins.def etc. side, those aren't builtins but libcalls.

If you need other libgcc calls, similar questions to above apply, but please
don't add them just because you can, but only if you really need them (they
can't be handled in hw instructions and promotion to SFmode and conversion
afterwards is undesirable and you actually have code that proves it emits
those calls).  Again, they should be only enabled on arches which ask for it
(and/or sub-ABIs) and they need to symbol version stuff resolved.

        Jakub

Reply via email to