andrew.w.kaylor added a comment.
Herald added a project: All.

In D120395#3346591 <https://reviews.llvm.org/D120395#3346591>, @scanon wrote:

> There's a lot of churn around proposed "solutions" on this and related PR, 
> but not a very clear analysis of what the problem we're trying to solve is.

I thought the problem that the patch was originally trying to solve is that 
`__bfloat16` variables could be used in arithmetic operations (using the 
`__bfloat16` type defined in the avx512bf16intrin.h header). For example, clang 
currently compiles this code without any diagnostics if the target processor 
has the required features (avx512bf16 and avx512vl).

  #include <immintrin.h>
  float f(float x, float y) {
    __bfloat16 x16 = _mm_cvtness_sbh(x);
    __bfloat16 y16 = _mm_cvtness_sbh(y);
    __bfloat16 z16 = x16 + y16;
    return _mm_cvtsbh_ss(z16);
  }

https://godbolt.org/z/vcbcGsPPx

The problem is that the instructions generated for that code are completely 
wrong because `__bfloat` is defined as `unsigned short`. It relies on the user 
knowing that they shouldn't use this type in arithmetic operations.

Like I said, I //thought// that was the original intention of this patch. 
However, the latest version of the patch doesn't prevent this at all. In fact, 
it makes the problem worse by asking the user to define the BF16 variables as 
unsigned short in their code. Getting correct behavior from this point

@pengfei Please correct me if I misunderstood the purpose of this patch.

In D120395#3346591 <https://reviews.llvm.org/D120395#3346591>, @scanon wrote:

> Concretely, what are the semantics that we want for the BF16 types and 
> intrinsics? Unlike the other floating-point types, there's no standard to 
> guide this, so it's even more important to clearly specify how these types 
> are to be used, instead of having an ad-hoc semantics of whatever someone 
> happens to implement.

The binary representation of a BF16 value (such as the value returned by 
_mm_cvtness_sbh) is, as Phoebe mentioned, the "brain floating point type" as 
described here: https://en.wikichip.org/wiki/brain_floating-point_format

Unfortunately, what you can do with it seems to depend on the target 
architecture. For very recent x86 processors, you can convert vectors of this 
type to and from single precision floating point and you can do a SIMD dot 
product and accumulate operation (VDPBF16PS), but the only way to do this is 
with intrinsics. Some ARM processors support other operations, but I think with 
similar restrictions (i.e. only accessible through intrinsics). Apart from 
intrinsics, it is treated as a storage-only type.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to