YonahGoldberg wrote:
Let me give a little more context here. Currently, as you guys pointed out, the
small FP types in CUDA are just integer wrappers. `__half` is just `struct
{unsigned short}` and `__half2` is `struct {unsigned short; unsigned short}`.
Currently `cuda_fp16.hpp` implements arithmetic on these types (mostly) by
casting to short/unsigned int and then calling inline PTX.
There are a few reasons for moving the implementation of this arithmetic to
libdevice:
1. float and double arithmetic already lives there so it makes it more uniform
2. MLIR code targeting LLVM can call into the fp16 functions
3. We'd like to move the implementations away from inline PTX in the future to
native LLVM half + intrinsics. Unfortunately, as you pointed out, CUDA doesn't
have any native fp16 support, but we can link with libdevice, which can have
native fp16 implementations.
As you can see the builtins take in unsigned short/unsigned int to be
compatible with the CUDA layer, but cast to half/ 2xhalf because we want the
libdevice API to operate on these types.
> It would be great if we could just build libdevice from source (or
> incorporate those sources into clang headers, if they were released under
> acceptable license). That would be an improvement over the current binary IR
> blob + handwritten wrappers/declarations mess. We've had typos, we've had
> missing functions, we've had missed optimizations, the list goes on...
Yeah I think I agree with this. IP-wise I don't think there's anything stopping
us from distributing the source. I can ask about that.
https://github.com/llvm/llvm-project/pull/174005
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits