AlexMaclean wrote: In general, I think PTX has lots of instructions which are essentially syntactic sugar or can easily be represented by a couple existing instructions. While they may make hand-writing PTX easier, we should probably not represent these as distinct intrinsics in LLVM IR as it will make adding peephole optimizations for these more difficult and make it harder to get a canonical form. We haven't been good about this in the past but I think it's probably smart to be more judicious about adding new intrinsics going forward.
> That may work, though the problem with overloaded intrinsics is that the set > of types we overload on is somewhat awkward to control if we need intrinsics > only for a subset of types. Then we need to deal with the overloads that > nominally accepted, but can't be lowered. With regard to overloaded intrinsics, I think that whenever we have a case where an intrinsic supports multiple types, we should use an overloaded intrinsic. It's true this allows frontends to generate malformed IR that the verifier won't complain about but we cannot actually lower. However, I think there are already many, many cases where LLVM IR is technically valid but the NVPTX backend cannot lower it due to an unsupported SM or type. I think the implicit understand already is that creators of IR for the NVPTX backend need to be careful about what they generate and confirm it can be selected. Using overloaded intrinsics when some types are not supported seems fine within that current status quo. It would be nice if we could specify a supported set of types for overloaded intrinsics though, perhaps the intrinsic records could be extended to support something like this in the future. > As for overloading, I'm not entirely sure about it since it looks like > overlapping variants with same modifiers for the different floating point > types is kind of sparse. For example, any rounding mode other than `rn` can't > be overloaded for `f16(x2)` and `bf16(x2)` and anything with `sat` or `ftz` > can't be overloaded for `bf16(x2)`. > But on the other hand, there are also some variants like `fma.rn.oob` which I > think _could_ be overloaded since it supports all fp16 types. Is it okay to > have only some of the intrinsic variants be overloaded and with generic names > while we have other similar ones tied to a single type (which could be > renamed to remove the type in the intrinsic name if we want uniformity in the > naming)? I think we should try to either use an overloaded intrinsic (in which case the type will be automatically added as a suffix). Or if only one type is supported we should add a type suffix that is consistent with the suffixes used for overloading (ie `v2f16` not `f16x2`). This way if future hardware supports more variants we can switch to an overloaded intrinsic without needing to auto-upgrade. For this MR I think only `fma.rn.oob` should be overloaded but the rest should use `v2` suffixes. https://github.com/llvm/llvm-project/pull/170079 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
