On Tue, Aug 13, 2024 at 11:14:47AM +0800, Xiao Zeng wrote: > Thank you very much for the in-depth discussion between Jakub Jelinek and > jeff. > My knowledge is narrow, and I am not familiar with architectures other than > RISCV. > At the same time, my understanding of libraries such as libc and libm is also > shallow. > > I spent some time sorting out my thoughts, which resulted in slow email > replies. I am very sorry.
The important thing is that the current state of BF16 support on other architectures is what we want there, not more. So any changes done for RISCV shouldn't affect the other architectures, that wasn't the case of the patch you've posted. E.g. on x86_64, for FP16 we have: __divhc3@@GCC_12.0.0 __eqhf2@@GCC_12.0.0 __extendhfdf2@@GCC_12.0.0 __extendhfsf2@@GCC_12.0.0 __extendhftf2@@GCC_12.0.0 __extendhfxf2@@GCC_12.0.0 __fixhfti@@GCC_12.0.0 __fixunshfti@@GCC_12.0.0 __floatbitinthf@@GCC_14.0.0 __floattihf@@GCC_12.0.0 __floatuntihf@@GCC_12.0.0 __mulhc3@@GCC_12.0.0 __nehf2@@GCC_12.0.0 __truncdfhf2@@GCC_12.0.0 __trunchfbf2@@GCC_13.0.0 __truncsfhf2@@GCC_12.0.0 __trunctfhf2@@GCC_12.0.0 __truncxfhf2@@GCC_12.0.0 exported from libgcc, while for BF16 just: __extendbfsf2@@GCC_13.0.0 __floatbitintbf@@GCC_14.0.0 __floattibf@@GCC_13.0.0 __floatuntibf@@GCC_13.0.0 __truncdfbf2@@GCC_13.0.0 __trunchfbf2@@GCC_13.0.0 __truncsfbf2@@GCC_13.0.0 __trunctfbf2@@GCC_13.0.0 __truncxfbf2@@GCC_13.0.0 More attention has been paid to what we actually need there, which is primarily conversions to/from other types (but even not to all of them, with some changes on the RTL expression lowering side to make sure we use the SFmode arithmetics as much as possible and only have the really required stuff on the libgcc side. We don't want to change that, if you really need __mulbc3/__divbc3 on RISCV, then it should be added for that arch only. And similarly, the choice of the builtins on the compiler side, the two builtins we have right now is all we want on the other arches. So, further builtins would be either a matter of RISCV specific builtins, or in generic code but guarded by some target hook so that they aren't enabled on arches which don't want them. On the libstdc++ side, the current headers provide for std::bfloat16_t and std::float16_t an implementation which uses SFmode calculations where possible, so stuff like: constexpr _Float16 acos(_Float16 __x) { return _Float16(__builtin_acosf(__x)); } or constexpr __gnu_cxx::__bfloat16_t acos(__gnu_cxx::__bfloat16_t __x) { return __gnu_cxx::__bfloat16_t(__builtin_acosf(__x)); } And for printing, note there is _ZSt20__to_chars_float16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31 _ZSt21__to_chars_bfloat16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31 _ZSt22__from_chars_float16_tPKcS0_RfSt12chars_format@@GLIBCXX_3.4.31 _ZSt23__from_chars_bfloat16_tPKcS0_RfSt12chars_format@@GLIBCXX_3.4.31 which input and output _Float16 and __bf16, but in the parameter passing they expect those types to be promoted to float, so that the ABIs aren't dependent on when a particular arch enables those types. For RISCV, the things to consider are, what is the _Float16 and __bf16 function argument passing/returning ABI? Is the type enabled on all variants of RISCV, or just some (e.g. regarding _Float16 and __bf16 on i686-linux, there is support for it only if the SSE2 ISA is available, so e.g. the *[hb][fc]* functions in libgcc need to be compiled with -msse2 extra flag)? If it can be passed/returned the same in all ABIs, what excess precision mode do you want to use on them? I mean e.g. the TARGET_C_EXCESS_PRECISION target hook. On e.g. x86_64, the default is to promote all _Float16 and __bf16 calculations to float, so if you have __bf16 a, b, c, d, e; ... a = b * c + d - e + c * d; all variables are converted to SFmode temporaries and all the arithmetics is done in SFmode and only then at the end finally converted to HFmode or BFmode. One can request a different mode, -fexcess-precision=16 in which such promotion isn't done, but as there is no hw support for most of the operations, the actual multiplication, addition or subtraction is still done in SFmode, just there is a conversion to BFmode after each operation (so slower, but more precise). If you still want to export __divbc3 and __mulbc3, do you want to export those just on some RISCV ABI variants or all of them? Depending on that, arrange for those to be compiled just for those; and, if it is exported from libgcc_s.so.1, you also need to add a symbol version for those, likely GCC_15.0.0. For enabling just those 2 functions, I don't think you need any changes on the builtins.def etc. side, those aren't builtins but libcalls. If you need other libgcc calls, similar questions to above apply, but please don't add them just because you can, but only if you really need them (they can't be handled in hw instructions and promotion to SFmode and conversion afterwards is undesirable and you actually have code that proves it emits those calls). Again, they should be only enabled on arches which ask for it (and/or sub-ABIs) and they need to symbol version stuff resolved. Jakub