fma intrinsics (PR #170079)

Alex MacLean via cfe-commits Tue, 02 Dec 2025 09:22:45 -0800

AlexMaclean wrote:

In general, I think PTX has lots of instructions which are essentially 
syntactic sugar or can easily be represented by a couple existing instructions. 
While they may make hand-writing PTX easier, we should probably not represent 
these as distinct intrinsics in LLVM IR as it will make adding peephole 
optimizations for these more difficult and make it harder to get a canonical 
form. We haven't been good about this in the past but I think it's probably 
smart to be more judicious about adding new intrinsics going forward.


> That may work, though the problem with overloaded intrinsics is that the set 
> of types we overload on is somewhat awkward to control if we need intrinsics 
> only for a subset of types. Then we need to deal with the overloads that 
> nominally accepted, but can't be lowered.

With regard to overloaded intrinsics, I think that whenever we have a case 
where an intrinsic supports multiple types, we should use an overloaded 
intrinsic. It's true this allows frontends to generate malformed IR that the 
verifier won't complain about but we cannot actually lower. However, I think 
there are already many, many cases where LLVM IR is technically valid but the 
NVPTX backend cannot lower it due to an unsupported SM or type. I think the 
implicit understand already is that creators of IR for the NVPTX backend need 
to be careful about what they generate and confirm it can be selected. Using 
overloaded intrinsics when some types are not supported seems fine within that 
current status quo. It would be nice if we could specify a supported set of 
types for overloaded intrinsics though, perhaps the intrinsic records could be 
extended to support something like this in the future. 

> As for overloading, I'm not entirely sure about it since it looks like 
> overlapping variants with same modifiers for the different floating point 
> types is kind of sparse. For example, any rounding mode other than `rn` can't 
> be overloaded for `f16(x2)` and `bf16(x2)` and anything with `sat` or `ftz` 
> can't be overloaded for `bf16(x2)`.
> But on the other hand, there are also some variants like `fma.rn.oob` which I 
> think _could_ be overloaded since it supports all fp16 types. Is it okay to 
> have only some of the intrinsic variants be overloaded and with generic names 
> while we have other similar ones tied to a single type (which could be 
> renamed to remove the type in the intrinsic name if we want uniformity in the 
> naming)?

I think we should try to either use an overloaded intrinsic (in which case the 
type will be automatically added as a suffix). Or if only one type is supported 
we should add a type suffix that is consistent with the suffixes used for 
overloading (ie `v2f16` not `f16x2`). This way if future hardware supports more 
variants we can switch to an overloaded intrinsic without needing to 
auto-upgrade. For this MR I think only `fma.rn.oob` should be overloaded but 
the rest should use `v2` suffixes. 

https://github.com/llvm/llvm-project/pull/170079
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][NVPTX] Add missing half-precision add/mul/fma intrinsics (PR #170079)

Reply via email to