pengfei added inline comments.
================ Comment at: clang/docs/LanguageExtensions.rst:852 ``double`` when passed to ``printf``, so the programmer must explicitly cast it to ``double`` before using it with an ``%f`` or similar specifier. ---------------- rjmccall wrote: > codemzs wrote: > > pengfei wrote: > > > rjmccall wrote: > > > > pengfei wrote: > > > > > rjmccall wrote: > > > > > > Suggested rework: > > > > > > > > > > > > ``` > > > > > > Clang supports three half-precision (16-bit) floating point types: > > > > > > ``__fp16``, > > > > > > ``_Float16`` and ``__bf16``. These types are supported in all > > > > > > language > > > > > > modes, but not on all targets: > > > > > > > > > > > > - ``__fp16`` is supported on every target. > > > > > > > > > > > > - ``_Float16`` is currently supported on the following targets: > > > > > > * 32-bit ARM (natively on some architecture versions) > > > > > > * 64-bit ARM (AArch64) (natively on ARMv8.2a and above) > > > > > > * AMDGPU (natively) > > > > > > * SPIR (natively) > > > > > > * X86 (if SSE2 is available; natively if AVX512-FP16 is also > > > > > > available) > > > > > > > > > > > > - ``__bf16`` is currently supported on the following targets: > > > > > > * 32-bit ARM > > > > > > * 64-bit ARM (AArch64) > > > > > > * X86 (when SSE2 is available) > > > > > > > > > > > > (For X86, SSE2 is available on 64-bit and all recent 32-bit > > > > > > processors.) > > > > > > > > > > > > ``__fp16`` and ``_Float16`` both use the binary16 format from IEEE > > > > > > 754-2008, which provides a 5-bit exponent and an 11-bit significand > > > > > > (counting the implicit leading 1). ``__bf16`` uses the `bfloat16 > > > > > > <https://en.wikipedia.org/wiki/Bfloat16_floating-point_format>`_ > > > > > > format, > > > > > > which provides an 8-bit exponent and an 8-bit significand; this is > > > > > > the same > > > > > > exponent range as `float`, just with greatly reduced precision. > > > > > > > > > > > > ``_Float16`` and ``__bf16`` follow the usual rules for arithmetic > > > > > > floating-point types. Most importantly, this means that arithmetic > > > > > > operations > > > > > > on operands of these types are formally performed in the type and > > > > > > produce > > > > > > values of the type. ``__fp16`` does not follow those rules: most > > > > > > operations > > > > > > immediately promote operands of type ``__fp16`` to ``float``, and so > > > > > > arithmetic operations are defined to be performed in ``float`` and > > > > > > so result in > > > > > > a value of type ``float`` (unless further promoted because of other > > > > > > operands). > > > > > > See below for more information on the exact specifications of these > > > > > > types. > > > > > > > > > > > > Only some of the supported processors for ``__fp16`` and ``__bf16`` > > > > > > offer > > > > > > native hardware support for arithmetic in their corresponding > > > > > > formats. > > > > > > The exact conditions are described in the lists above. When > > > > > > compiling for a > > > > > > processor without native support, Clang will perform the arithmetic > > > > > > in > > > > > > ``float``, inserting extensions and truncations as necessary. This > > > > > > can be > > > > > > done in a way that exactly emulates the behavior of hardware > > > > > > support for > > > > > > arithmetic, but it can require many extra operations. By default, > > > > > > Clang takes > > > > > > advantage of the C standard's allowances for excess precision in > > > > > > intermediate > > > > > > operands in order to eliminate intermediate truncations within > > > > > > statements. > > > > > > This is generally much faster but can generate different results > > > > > > from strict > > > > > > operation-by-operation emulation. > > > > > > > > > > > > The use of excess precision can be independently controlled for > > > > > > these two > > > > > > types with the ``-ffloat16-excess-precision=`` and > > > > > > ``-fbfloat16-excess-precision=`` options. Valid values include: > > > > > > - ``none`` (meaning to perform strict operation-by-operation > > > > > > emulation) > > > > > > - ``standard`` (meaning that excess precision is permitted under > > > > > > the rules > > > > > > described in the standard, i.e. never across explicit casts or > > > > > > statements) > > > > > > - ``fast`` (meaning that excess precision is permitted whenever the > > > > > > optimizer sees an opportunity to avoid truncations; currently > > > > > > this has no > > > > > > effect beyond ``standard``) > > > > > > > > > > > > The ``_Float16`` type is an interchange floating type specified in > > > > > > ISO/IEC TS 18661-3:2015 ("Floating-point extensions for C"). It > > > > > > will > > > > > > be supported on more targets as they define ABIs for it. > > > > > > > > > > > > The ``__bf16`` type is a non-standard extension, but it generally > > > > > > follows > > > > > > the rules for arithmetic interchange floating types from ISO/IEC TS > > > > > > 18661-3:2015. In previous versions of Clang, it was a storage-only > > > > > > type > > > > > > that forbade arithmetic operations. It will be supported on more > > > > > > targets > > > > > > as they define ABIs for it. > > > > > > > > > > > > The ``__fp16`` type was originally an ARM extension and is specified > > > > > > by the `ARM C Language Extensions > > > > > > <https://github.com/ARM-software/acle/releases>`_. > > > > > > Clang uses the ``binary16`` format from IEEE 754-2008 for > > > > > > ``__fp16``, > > > > > > not the ARM alternative format. Operators that expect arithmetic > > > > > > operands > > > > > > immediately promote ``__fp16`` operands to ``float``. > > > > > > > > > > > > It is recommended that portable code use ``_Float16`` instead of > > > > > > ``__fp16``, > > > > > > as it has been defined by the C standards committee and has > > > > > > behavior that is > > > > > > more familiar to most programmers. > > > > > > > > > > > > Because ``__fp16`` operands are always immediately promoted to > > > > > > ``float``, the > > > > > > common real type of ``__fp16`` and ``_Float16`` for the purposes of > > > > > > the usual > > > > > > arithmetic conversions is ``float``. > > > > > > > > > > > > A literal can be given ``_Float16`` type using the suffix ``f16``. > > > > > > For example, > > > > > > ``3.14f16``. > > > > > > > > > > > > Because default argument promotion only applies to the standard > > > > > > floating-point > > > > > > types, ``_Float16`` values are not promoted to ``double`` when > > > > > > passed as variadic > > > > > > or untyped arguments. As a consequence, some caution must be taken > > > > > > when using > > > > > > certain library facilities with ``_Float16``; for example, there is > > > > > > no ``printf`` format > > > > > > specifier for ``_Float16``, and (unlike ``float``) it will not be > > > > > > implicitly promoted to > > > > > > ``double`` when passed to ``printf``, so the programmer must > > > > > > explicitly cast it to > > > > > > ``double`` before using it with an ``%f`` or similar specifier. > > > > > > ``` > > > > > ``` > > > > > Only some of the supported processors for ``__fp16`` and ``__bf16`` > > > > > offer > > > > > native hardware support for arithmetic in their corresponding formats. > > > > > ``` > > > > > > > > > > Do you mean ``_Float16``? > > > > > > > > > > ``` > > > > > The exact conditions are described in the lists above. When > > > > > compiling for a > > > > > processor without native support, Clang will perform the arithmetic in > > > > > ``float``, inserting extensions and truncations as necessary. > > > > > ``` > > > > > > > > > > It's a bit conflict with `These types are supported in all language > > > > > modes, but not on all targets`. > > > > > Why do we need to emulate for a type that doesn't necessarily support > > > > > on all target? > > > > > > > > > > My understand is that inserting extensions and truncations are used > > > > > for 2 purposes: > > > > > 1. A type that is designed to support all target. For now, it's only > > > > > used for __fp16. > > > > > 2. Support excess-precision=`standard`. This applies for both > > > > > _Float16 and __bf16. > > > > > > > > > > Do you mean `_Float16`? > > > > > > > > Yes, thank you. I knew I'd screw that up somewhere. > > > > > > > > > Why do we need to emulate for a type that doesn't necessarily support > > > > > on all target? > > > > > > > > Would this be clearer? > > > > > > > > ``` > > > > Arithmetic on ``_Float16`` and ``__bf16`` is enabled on some targets > > > > that don't > > > > provide native architectural support for arithmetic on these formats. > > > > These > > > > targets are noted in the lists of supported targets above. On these > > > > targets, > > > > Clang will perform the arithmetic in ``float``, inserting extensions > > > > and truncations > > > > as necessary. > > > > ``` > > > > > > > > > My understand is that inserting extensions and truncations are used > > > > > for 2 purposes: > > > > > > > > No, I believe we always insert extensions and truncations. The cases > > > > you're describing are places we insert extensions and truncations in > > > > the *frontend*, so that the backend doesn't see operations on `half` / > > > > `bfloat` at all. But when these operations do make it to the backend, > > > > and there's no direct architectural support for them on the target, the > > > > backend still just inserts extensions and truncations so it can do the > > > > arithmetic in `float`. This is clearest in the ARM codegen > > > > (https://godbolt.org/z/q9KoGEYqb) because the conversions are just > > > > instructions, but you can also see it in the X86 codegen > > > > (https://godbolt.org/z/ejdd4P65W): all the runtime functions are just > > > > extensions/truncations, and the actual arithmetic is done with `mulss` > > > > and `addss`. This frontend/backend distinction is not something that > > > > matters to users, so the documentation glosses over the difference. > > > > > > > > I haven't done an exhaustive investigation, so it's possible that there > > > > are types and targets where we emit a compiler-rt call to do each > > > > operation instead, but those compiler-rt functions almost certainly > > > > just do an extension to float in the same way, so I don't think the > > > > documentation as written would be misleading for those targets, either. > > > Thanks for the explanation! Sorry, I failed to make the distinction > > > between "support" and "natively support", I guess users may be confusing > > > at the beginning too. > > > > > > I agree the documentation is to explain the whole behavior of compile to > > > user. I think we have 3 aspects that want to tell users: > > > > > > 1. Whether a type is arithmetic type or not and is (natively) supported > > > by all targets or just a few; > > > 2. The result of a type may not be consistent across different targets > > > or/and excess-precision value; > > > 3. The excess-precision control doesn't take effect if a type is natively > > > supported by targets; > > > > > > It would be more clear if we can give such a summary before the detailed > > > explanation. > > Does adding the below to the top of the description make it more clear? > > > > Half-Precision Floating Point > > ============================= > > > > Clang supports three half-precision (16-bit) floating point types: > > ``__fp16``, ``_Float16`` and ``__bf16``. These types are supported in all > > language modes, but their support differs across targets. Here, it's > > important to understand the difference between "support" and "natively > > support": > > > > - A type is "supported" if the compiler can handle code using that type, > > which might involve translating operations into an equivalent code that the > > target hardware understands. > > - A type is "natively supported" if the hardware itself understands the > > type and can perform operations on it directly. This typically yields > > better performance and more accurate results. > > > > Another crucial aspect to note is the consistency of the result of a type > > across different targets and excess-precision values. Different hardware > > (targets) might produce slightly different results due to the level of > > precision they support and how they handle excess-precision values. It > > means the same code can yield different results when compiled for different > > hardware. > > > > Finally, note that the control of excess-precision does not take effect if > > a type is natively supported by targets. If the hardware supports the type > > directly, the compiler does not need to (and cannot) use excess precision > > to potentially speed up the operations. > > > > Given these points, here is the detailed support for each type: > > > > - ``__fp16`` is supported on every target. > > > > - ``_Float16`` is currently supported on the following targets: > > * 32-bit ARM (natively on some architecture versions) > > * 64-bit ARM (AArch64) (natively on ARMv8.2a and above) > > * AMDGPU (natively) > > * SPIR (natively) > > * X86 (if SSE2 is available; natively if AVX512-FP16 is also available) > > > > - ``__bf16`` is currently supported on the following targets: > > * 32-bit ARM > > * 64-bit ARM (AArch64) > > * X86 (when SSE2 is available) > > > > ... > > ... > I think that's a good basic idea, but it's okay to leave some of the detail > for later. How about this: > > ``` > Clang supports three half-precision (16-bit) floating point types: > ``__fp16``, ``_Float16`` and ``__bf16``. These types are supported in all > language modes, but their support differs between targets. A target is said > to have "native support" for a type if the target processor offers > instructions for directly performing basic arithmetic on that type. In the > absence of native support, a type can still be supported if the compiler can > emulate arithmetic on the type by promoting to ``float``; see below for more > information on this emulation. > > * ``__fp16`` is supported on all targets. The special semantics of this type > mean that no arithmetic is ever performed directly on ``__fp16`` values; see > below. > > * ``_Float16`` is supported on the following targets: (...) > > * ``__bf16`` is supported on the following targets (currently never > natively): (...) > ``` > > And then below we can adjust the paragraph about emulation: > > ``` > When compiling arithmetic on ``_Float16`` and ``__bf16`` for a target without > native support, Clang will perform the arithmetic in ``float``, inserting > extensions > and truncations as necessary. This can be done in a way that exactly matches > the > operation-by-operation behavior of native support, but that can require many > extra truncations and extensions. By default, when emulating ``_Float16`` and > ``__bf16`` arithmetic using ``float``, Clang does not truncate intermediate > operands > back to their true type unless the operand is the result of an explicit cast > or > assignment. This is generally much faster but can generate different results > from > strict operation-by-operation emulation. (Usually the results are more > precise.) > This is permitted by the C and C++ standards under the rules for excess > precision > in intermediate operands; see the discussion of evaluation formats in the C > standard and [expr.pre] in the C++ standard. > ``` This revision looks better. The contents are rather clear to me. Thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D150913/new/ https://reviews.llvm.org/D150913 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits