================
@@ -1705,16 +1705,21 @@ class CIRGenFunction : public CIRGenTypeCache {
 
   void instantiateIndirectGotoBlock();
 
-  /// Emit a simple LLVM intrinsic that takes N scalar arguments and whose
-  /// return type matches the type of the first argument. The intrinsic name is
-  /// used verbatim; any overload mangling (e.g. `.f32`, `.p1`) must be baked
-  /// into \p intrinName by the caller.
+  /// Emit a simple LLVM intrinsic that takes N scalar arguments.  The 
intrinsic
+  /// name is used verbatim; any overload mangling (e.g. `.f32`, `.p1`) must be
+  /// baked into \p intrinName by the caller.  The result type defaults to the
+  /// type of the first argument; pass \p resultType for intrinsics whose 
result
+  /// differs from the operand, such as a vector reduction that returns the
+  /// element type.  Unlike classic CodeGen, CIR has no intrinsic registry to
+  /// derive the result type from the operand, so it must be supplied here.
   template <unsigned N>
   [[maybe_unused]] RValue
   emitBuiltinWithOneOverloadedType(const CallExpr *e,
-                                   llvm::StringRef intrinName) {
+                                   llvm::StringRef intrinName,
+                                   mlir::Type resultType = {}) {
----------------
adams381 wrote:

> calls to this function above?

---

The AMDGPU callers above default the result type because those intrinsics 
return the same type as their operand (`readfirstlane`, `ds.swizzle`, 
`readlane`, `div.fmas` are all result == arg0), so `convertType(arg0)` is 
correct for them. `vector.reduce.*` is the case where it isn't: the result is 
the element type, not the vector operand.

Classic doesn't pass a result type because 
`getIntrinsic(Intrinsic::vector_reduce_xor, {VecTy})` reads the return type off 
the intrinsic signature. CIR's `LLVMIntrinsicCallOp` lowering uses the op's 
declared result type directly (`LowerToLLVM.cpp`, 
`convertType(op->getResultTypes()[0])`) with no registry to derive it, so the 
element type has to be supplied at the call.

Defaulting reduce to `convertType(arg0)` (the vector) emits `<4 x i32> 
@llvm.vector.reduce.xor.v4i32(<4 x i32>)`, and lowering aborts:

```
error: call intrinsic signature <4 x i32> (<4 x i32>) to overloaded intrinsic 
"llvm.vector.reduce.xor" does not match any of the overloads: intrinsic return 
type (vector element of overload type 0) expected i32 (overload type 0 is <4 x 
i32>), but got <4 x i32>
fatal error: error in backend: Lowering from LLVMIR dialect to llvm IR failed!
```

If you'd rather not give the shared helper an optional result type, the 
alternative is to emit these three intrinsic calls inline with the element type 
and leave the helper unchanged. @andykaylor asked to route through the helper, 
which is why I extended it instead; either path has to specify the element 
result type.

https://github.com/llvm/llvm-project/pull/201164
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to