[clang] [Clang] Support fp16 in libdevice for CUDA 13.3 (PR #174005)

Yonah Goldberg via cfe-commits Wed, 07 Jan 2026 13:44:16 -0800

YonahGoldberg wrote:

Let me give a little more context here. Currently, as you guys pointed out, the 
small FP types in CUDA are just integer wrappers. `__half` is just `struct 
{unsigned short}` and `__half2` is `struct {unsigned short; unsigned short}`. 
Currently `cuda_fp16.hpp` implements arithmetic on these types (mostly) by 
casting to short/unsigned int and then calling inline PTX.


There are a few reasons for moving the implementation of this arithmetic to 
libdevice:
1. float and double arithmetic already lives there so it makes it more uniform
2. MLIR code targeting LLVM can call into the fp16 functions
3. We'd like to move the implementations away from inline PTX in the future to 
native LLVM half + intrinsics. Unfortunately, as you pointed out, CUDA doesn't 
have any native fp16 support, but we can link with libdevice, which can have 
native fp16 implementations.

As you can see the builtins take in unsigned short/unsigned int to be 
compatible with the CUDA layer, but cast to half/ 2xhalf because we want the 
libdevice API to operate on these types.

> It would be great if we could just build libdevice from source (or 
> incorporate those sources into clang headers, if they were released under 
> acceptable license). That would be an improvement over the current binary IR 
> blob + handwritten wrappers/declarations mess. We've had typos, we've had 
> missing functions, we've had missed optimizations, the list goes on...

Yeah I think I agree with this. IP-wise I don't think there's anything stopping 
us from distributing the source. I can ask about that.

https://github.com/llvm/llvm-project/pull/174005
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Support fp16 in libdevice for CUDA 13.3 (PR #174005)

Reply via email to