[clang] [CIR] Add Direct coerce-in-registers + cir.reinterpret_cast op (PR #195879)

via cfe-commits Tue, 26 May 2026 13:56:52 -0700

https://github.com/adams381 updated 
https://github.com/llvm/llvm-project/pull/195879


>From bb0d6cca069ecaf5042b5bc528343ff6a497b632 Mon Sep 17 00:00:00 2001
From: Adam Smith <[email protected]>
Date: Mon, 4 May 2026 14:09:42 -0700
Subject: [PATCH 1/2] [CIR] Add Extend (signext/zeroext) handling to
 CallConvLowering

Third PR in the series splitting #192119 / #192124. Adds handlers
for the Extend ArgKind in both rewriteFunctionDefinition and
rewriteCallSite.

The CIR signature keeps the original (narrow) integer type;
sign- or zero-extension is communicated to LLVM via the
llvm.signext / llvm.zeroext arg_attrs and res_attrs entries
attached by the rewriter. This matches Classic Clang's LLVM IR
convention (e.g. `define void @f(i8 signext %x)`, not `define
void @f(i32 signext %x)` with an entry-block truncation).

The coercedType field on an Extend ArgClassification is treated
as informational only -- it carries the LLVM ABI library's
register-width type for downstream consumers, but the rewriter
does not use it to change the CIR signature.

Three .cir tests covering the three Extend code paths
(narrow-signed-arg, narrow-unsigned-arg, narrow-signed-return).
The test target's narrow-int Extend rule fires only on MLIR
builtin IntegerType -- since CIR functions take cir::IntType,
these tests exercise the rewriter via the classification-
injection driver added in PR A1.

check-clang-cir-codegen / check-clang-cir both pass with no
regressions.

Co-authored-by: Cursor <[email protected]>
---
 .../TargetLowering/CIRABIRewriteContext.cpp   | 152 ++++++++++++++----
 .../Transforms/abi-lowering/extend-return.cir |  41 +++++
 .../abi-lowering/extend-signed-arg.cir        |  40 +++++
 .../abi-lowering/extend-unsigned-arg.cir      |  36 +++++
 4 files changed, 239 insertions(+), 30 deletions(-)
 create mode 100644 clang/test/CIR/Transforms/abi-lowering/extend-return.cir
 create mode 100644 clang/test/CIR/Transforms/abi-lowering/extend-signed-arg.cir
 create mode 100644 
clang/test/CIR/Transforms/abi-lowering/extend-unsigned-arg.cir

diff --git 
a/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.cpp 
b/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.cpp
index d33346cc0dcb3..895b477eb9054 100644
--- a/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.cpp
+++ b/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.cpp
@@ -27,6 +27,10 @@ using namespace mlir::abi;
 namespace {
 
 bool needsRewrite(const FunctionClassification &fc) {
+  // Direct without coercion is a true pass-through; any other kind (or a
+  // coerced Direct) means the rewriter must touch the IR.  Extend is
+  // technically attribute-only at the IR level but still counts because the
+  // attribute attachment changes observable behavior.
   if ((fc.returnInfo.kind != ArgKind::Direct) || fc.returnInfo.coercedType)
     return true;
   for (const ArgClassification &ac : fc.argInfos)
@@ -72,9 +76,14 @@ LogicalResult buildNewArgTypes(ArrayRef<Type> oldArgTypes,
                   << " not yet implemented in CallConvLowering";
       return failure();
     case ArgKind::Extend:
-      emitError() << "Extend at arg " << idx
-                  << " not yet implemented in CallConvLowering";
-      return failure();
+      // Extend keeps the original (narrow) type in the signature; the
+      // sign/zero extension is communicated to LLVM via the llvm.signext /
+      // llvm.zeroext arg attribute, attached separately below.  Any
+      // coercedType the classifier set on the Extend ArgClassification is
+      // informational (typically the register-width type the value gets
+      // extended to in registers) but does not change the CIR signature.
+      newArgTypes.push_back(origTy);
+      break;
     case ArgKind::Indirect:
       emitError() << "Indirect at arg " << idx
                   << " not yet implemented in CallConvLowering";
@@ -105,8 +114,10 @@ Type computeNewReturnType(Type origRetTy, const 
ArgClassification &retInfo,
                 << "it in EmitFunctionEpilog)";
     return nullptr;
   case ArgKind::Extend:
-    emitError() << "Extend return not yet implemented in CallConvLowering";
-    return nullptr;
+    // Same convention as Extend args: keep the original return type in the
+    // signature; the sign/zero extension is communicated via the
+    // llvm.signext / llvm.zeroext res attribute attached separately below.
+    return origRetTy;
   case ArgKind::Indirect:
     emitError() << "Indirect return (sret) not yet implemented in "
                 << "CallConvLowering";
@@ -124,6 +135,67 @@ Value createIgnoredValue(OpBuilder &builder, Location loc, 
Type ty) {
   return cir::ConstantOp::create(builder, loc, ty, cir::PoisonAttr::get(ty));
 }
 
+/// Build an updated arg_attrs ArrayAttr that drops Ignore'd args and adds
+/// llvm.signext / llvm.zeroext on Extend args.  Preserves any existing arg
+/// attributes on retained arg slots.
+ArrayAttr updateArgAttrs(MLIRContext *ctx, unsigned numNewArgs,
+                         ArrayAttr existingArgAttrs,
+                         const FunctionClassification &fc) {
+  SmallVector<Attribute> newArgAttrs(numNewArgs, DictionaryAttr::get(ctx));
+
+  // Step 1: copy existing arg attrs over to their new positions, skipping
+  // Ignore'd args.
+  if (existingArgAttrs) {
+    unsigned newIdx = 0;
+    for (unsigned oldIdx = 0; oldIdx < existingArgAttrs.size(); ++oldIdx) {
+      if (oldIdx < fc.argInfos.size() &&
+          fc.argInfos[oldIdx].kind == ArgKind::Ignore)
+        continue;
+      if (newIdx < numNewArgs)
+        newArgAttrs[newIdx] = existingArgAttrs[oldIdx];
+      ++newIdx;
+    }
+  }
+
+  // Step 2: layer llvm.signext / llvm.zeroext onto each Extend arg.  The new
+  // arg index for Extend at original index `oldIdx` is `oldIdx` minus the
+  // number of Ignore'd args that came before it.
+  unsigned newIdx = 0;
+  for (auto [oldIdx, ac] : llvm::enumerate(fc.argInfos)) {
+    if (ac.kind == ArgKind::Ignore)
+      continue;
+    if (ac.kind == ArgKind::Extend && newIdx < numNewArgs) {
+      auto existing = cast<DictionaryAttr>(newArgAttrs[newIdx]);
+      SmallVector<NamedAttribute> attrs(existing.begin(), existing.end());
+      StringRef attrName = ac.signExtend ? "llvm.signext" : "llvm.zeroext";
+      attrs.push_back(
+          NamedAttribute(StringAttr::get(ctx, attrName), UnitAttr::get(ctx)));
+      newArgAttrs[newIdx] = DictionaryAttr::get(ctx, attrs);
+    }
+    ++newIdx;
+  }
+
+  return ArrayAttr::get(ctx, newArgAttrs);
+}
+
+/// Build an updated res_attrs ArrayAttr (single entry, since CIR funcs have
+/// at most one result) that adds llvm.signext / llvm.zeroext on an Extend
+/// return.  Preserves any existing res attributes.
+ArrayAttr updateResAttrs(MLIRContext *ctx, ArrayAttr existingResAttrs,
+                         const ArgClassification &retInfo) {
+  if (retInfo.kind != ArgKind::Extend)
+    return existingResAttrs;
+
+  SmallVector<NamedAttribute> attrs;
+  if (existingResAttrs && existingResAttrs.size() > 0)
+    for (NamedAttribute na : cast<DictionaryAttr>(existingResAttrs[0]))
+      attrs.push_back(na);
+  StringRef attrName = retInfo.signExtend ? "llvm.signext" : "llvm.zeroext";
+  attrs.push_back(
+      NamedAttribute(StringAttr::get(ctx, attrName), UnitAttr::get(ctx)));
+  return ArrayAttr::get(ctx, {DictionaryAttr::get(ctx, attrs)});
+}
+
 } // namespace
 
 LogicalResult CIRABIRewriteContext::rewriteFunctionDefinition(
@@ -212,22 +284,23 @@ LogicalResult 
CIRABIRewriteContext::rewriteFunctionDefinition(
   Type newFnTy = funcOp.cloneTypeWith(newArgTypes, newResultTypes);
   funcOp.setFunctionTypeAttr(TypeAttr::get(newFnTy));
 
-  // Keep the arg_attrs array in sync with the new argument count by
-  // dropping entries for every Ignored argument.  Without this the
-  // attribute array would have stale entries that no longer match any
-  // block argument.
-  SmallVector<unsigned> ignored = ignoredArgIndices(fc);
-  if (!ignored.empty()) {
-    if (auto existing = funcOp->getAttrOfType<ArrayAttr>("arg_attrs")) {
-      SmallVector<Attribute> kept;
-      kept.reserve(newArgTypes.size());
-      for (auto [oldIdx, attr] : llvm::enumerate(existing.getValue())) {
-        if (oldIdx >= fc.argInfos.size() ||
-            fc.argInfos[oldIdx].kind != ArgKind::Ignore)
-          kept.push_back(attr);
-      }
-      funcOp->setAttr("arg_attrs", ArrayAttr::get(ctx, kept));
-    }
+  // Rebuild arg_attrs: drop entries for Ignore'd args, layer
+  // llvm.signext / llvm.zeroext onto Extend args, preserve everything else.
+  bool needsArgAttrUpdate = !ignoredArgIndices(fc).empty();
+  for (const ArgClassification &ac : fc.argInfos)
+    if (ac.kind == ArgKind::Extend)
+      needsArgAttrUpdate = true;
+  if (needsArgAttrUpdate) {
+    auto existing = funcOp->getAttrOfType<ArrayAttr>("arg_attrs");
+    funcOp->setAttr("arg_attrs",
+                    updateArgAttrs(ctx, newArgTypes.size(), existing, fc));
+  }
+
+  // Rebuild res_attrs: layer llvm.signext / llvm.zeroext onto an Extend
+  // return.
+  if (fc.returnInfo.kind == ArgKind::Extend) {
+    auto existing = funcOp->getAttrOfType<ArrayAttr>("res_attrs");
+    funcOp->setAttr("res_attrs", updateResAttrs(ctx, existing, fc.returnInfo));
   }
 
   return success();
@@ -247,6 +320,8 @@ LogicalResult CIRABIRewriteContext::rewriteCallSite(
     return call.emitOpError()
            << "indirect call not yet implemented in CallConvLowering";
 
+  MLIRContext *ctx = callOp->getContext();
+
   for (auto [idx, ac] : llvm::enumerate(fc.argInfos)) {
     switch (ac.kind) {
     case ArgKind::Direct:
@@ -261,8 +336,9 @@ LogicalResult CIRABIRewriteContext::rewriteCallSite(
       return call.emitOpError() << "Expand at call-site arg " << idx
                                 << " not yet implemented in CallConvLowering";
     case ArgKind::Extend:
-      return call.emitOpError() << "Extend at call-site arg " << idx
-                                << " not yet implemented in CallConvLowering";
+      // Extend at the call site is just an attribute change (llvm.signext /
+      // llvm.zeroext on the call's arg_attrs); no IR-level cast.
+      break;
     case ArgKind::Indirect:
       return call.emitOpError() << "Indirect at call-site arg " << idx
                                 << " not yet implemented in CallConvLowering";
@@ -284,15 +360,13 @@ LogicalResult CIRABIRewriteContext::rewriteCallSite(
   }
 
   bool hasResult = call.getNumResults() > 0;
-  Type origRetTy = hasResult ? call.getResult().getType()
-                             : cir::VoidType::get(callOp->getContext());
+  Type origRetTy =
+      hasResult ? call.getResult().getType() : cir::VoidType::get(ctx);
   Type callRetTy = origRetTy;
   if (fc.returnInfo.kind == ArgKind::Ignore && hasResult)
-    callRetTy = cir::VoidType::get(callOp->getContext());
-  if ((fc.returnInfo.kind == ArgKind::Direct ||
-       fc.returnInfo.kind == ArgKind::Extend) &&
-      fc.returnInfo.coercedType)
-    return call.emitOpError() << "Direct/Extend return with coerced type at "
+    callRetTy = cir::VoidType::get(ctx);
+  if (fc.returnInfo.kind == ArgKind::Direct && fc.returnInfo.coercedType)
+    return call.emitOpError() << "Direct return with coerced type at "
                               << "call-site not yet implemented in "
                               << "CallConvLowering";
 
@@ -303,6 +377,24 @@ LogicalResult CIRABIRewriteContext::rewriteCallSite(
     if (!newCall->hasAttr(attr.getName()))
       newCall->setAttr(attr.getName(), attr.getValue());
 
+  // Layer llvm.signext / llvm.zeroext onto the new call's arg_attrs and
+  // res_attrs for Extend args/return.
+  bool needsArgAttrUpdate = false;
+  for (const ArgClassification &ac : fc.argInfos)
+    if (ac.kind == ArgKind::Extend || ac.kind == ArgKind::Ignore) {
+      needsArgAttrUpdate = true;
+      break;
+    }
+  if (needsArgAttrUpdate) {
+    auto existing = call->getAttrOfType<ArrayAttr>("arg_attrs");
+    newCall->setAttr("arg_attrs",
+                     updateArgAttrs(ctx, newArgs.size(), existing, fc));
+  }
+  if (fc.returnInfo.kind == ArgKind::Extend) {
+    auto existing = call->getAttrOfType<ArrayAttr>("res_attrs");
+    newCall->setAttr("res_attrs", updateResAttrs(ctx, existing, 
fc.returnInfo));
+  }
+
   if (hasResult && fc.returnInfo.kind == ArgKind::Ignore) {
     // The new call returns void, but the original call's result may still
     // have uses.  Substitute a poison constant of the original type so
diff --git a/clang/test/CIR/Transforms/abi-lowering/extend-return.cir 
b/clang/test/CIR/Transforms/abi-lowering/extend-return.cir
new file mode 100644
index 0000000000000..dd2c40fc5305e
--- /dev/null
+++ b/clang/test/CIR/Transforms/abi-lowering/extend-return.cir
@@ -0,0 +1,41 @@
+// Extend on a narrow return attaches llvm.signext (or llvm.zeroext) to
+// res_attrs while keeping the narrow type in the signature.  Body and
+// call-site IR are unchanged otherwise.
+// RUN: cir-opt %s -cir-call-conv-lowering="classification-attr=test_classify" 
\
+// RUN:   | FileCheck %s
+
+!s8i = !cir.int<s, 8>
+
+#extend_signed_return = {
+  return = { kind = "extend",
+             coerced_type = !cir.int<s, 32>,
+             sign_extend = true },
+  args   = [ ]
+}
+
+module attributes {
+  dlti.dl_spec = #dlti.dl_spec<
+    #dlti.dl_entry<i8, dense<8>: vector<2xi64>>,
+    #dlti.dl_entry<i32, dense<32>: vector<2xi64>>>
+} {
+
+  cir.func @returns_s8() -> !s8i
+      attributes { test_classify = #extend_signed_return } {
+    %0 = cir.const #cir.int<7> : !s8i
+    cir.return %0 : !s8i
+  }
+
+  // CHECK: cir.func{{.*}} @returns_s8() -> (!s8i {llvm.signext})
+  // CHECK:   cir.return %{{.*}} : !s8i
+
+  cir.func @caller() -> !s8i
+      attributes { test_classify = #extend_signed_return } {
+    %0 = cir.call @returns_s8() : () -> !s8i
+    cir.return %0 : !s8i
+  }
+
+  // CHECK: cir.func{{.*}} @caller() -> (!s8i {llvm.signext})
+  // CHECK:   %[[R:.*]] = cir.call @returns_s8() : () -> (!s8i {llvm.signext})
+  // CHECK:   cir.return %[[R]] : !s8i
+
+}
diff --git a/clang/test/CIR/Transforms/abi-lowering/extend-signed-arg.cir 
b/clang/test/CIR/Transforms/abi-lowering/extend-signed-arg.cir
new file mode 100644
index 0000000000000..600acdc94d6b3
--- /dev/null
+++ b/clang/test/CIR/Transforms/abi-lowering/extend-signed-arg.cir
@@ -0,0 +1,40 @@
+// Extend with sign_extend = true on a narrow signed integer arg.  The pass
+// keeps the narrow type in the signature (matching Classic Clang's LLVM IR
+// convention) and attaches llvm.signext to the corresponding arg_attrs.
+//
+// The test target's narrow-int Extend rule fires only on MLIR builtin
+// IntegerType, not cir::IntType, so this test uses the injection driver.
+// RUN: cir-opt %s -cir-call-conv-lowering="classification-attr=test_classify" 
\
+// RUN:   | FileCheck %s
+
+!s8i = !cir.int<s, 8>
+
+#extend_signed_arg = {
+  return = { kind = "direct" },
+  args   = [ { kind = "extend",
+               coerced_type = !cir.int<s, 32>,
+               sign_extend = true } ]
+}
+
+module attributes {
+  dlti.dl_spec = #dlti.dl_spec<
+    #dlti.dl_entry<i8, dense<8>: vector<2xi64>>,
+    #dlti.dl_entry<i32, dense<32>: vector<2xi64>>>
+} {
+
+  cir.func @takes_s8(%arg0: !s8i)
+      attributes { test_classify = #extend_signed_arg } {
+    cir.return
+  }
+
+  // CHECK: cir.func{{.*}} @takes_s8(%arg0: !s8i {llvm.signext})
+
+  cir.func @caller(%arg0: !s8i)
+      attributes { test_classify = #extend_signed_arg } {
+    cir.call @takes_s8(%arg0) : (!s8i) -> ()
+    cir.return
+  }
+
+  // CHECK: cir.call @takes_s8(%arg0) : (!s8i {llvm.signext}) -> ()
+
+}
diff --git a/clang/test/CIR/Transforms/abi-lowering/extend-unsigned-arg.cir 
b/clang/test/CIR/Transforms/abi-lowering/extend-unsigned-arg.cir
new file mode 100644
index 0000000000000..6c5c8207bb93b
--- /dev/null
+++ b/clang/test/CIR/Transforms/abi-lowering/extend-unsigned-arg.cir
@@ -0,0 +1,36 @@
+// Extend with sign_extend = false on a narrow unsigned integer arg attaches
+// llvm.zeroext (instead of llvm.signext) to the corresponding arg_attrs.
+// RUN: cir-opt %s -cir-call-conv-lowering="classification-attr=test_classify" 
\
+// RUN:   | FileCheck %s
+
+!u16i = !cir.int<u, 16>
+
+#extend_unsigned_arg = {
+  return = { kind = "direct" },
+  args   = [ { kind = "extend",
+               coerced_type = !cir.int<u, 32>,
+               sign_extend = false } ]
+}
+
+module attributes {
+  dlti.dl_spec = #dlti.dl_spec<
+    #dlti.dl_entry<i16, dense<16>: vector<2xi64>>,
+    #dlti.dl_entry<i32, dense<32>: vector<2xi64>>>
+} {
+
+  cir.func @takes_u16(%arg0: !u16i)
+      attributes { test_classify = #extend_unsigned_arg } {
+    cir.return
+  }
+
+  // CHECK: cir.func{{.*}} @takes_u16(%arg0: !u16i {llvm.zeroext})
+
+  cir.func @caller(%arg0: !u16i)
+      attributes { test_classify = #extend_unsigned_arg } {
+    cir.call @takes_u16(%arg0) : (!u16i) -> ()
+    cir.return
+  }
+
+  // CHECK: cir.call @takes_u16(%arg0) : (!u16i {llvm.zeroext}) -> ()
+
+}

>From 8386507a22d06cf93133b47759f2ee701a46647e Mon Sep 17 00:00:00 2001
From: Adam Smith <[email protected]>
Date: Tue, 5 May 2026 09:26:10 -0700
Subject: [PATCH 2/2] [CIR] Add Direct coerce-in-registers +
 cir.reinterpret_cast op

Fourth PR in the split of #192119/#192124. Implements the
Direct-with-coercion path in CallConvLowering and picks off
andykaylor's five inline review comments from the original PR.

The new cir.reinterpret_cast op is for same-bit-width in-register
reinterpretation (vector<2 x float> <-> complex<float>).
emitCoercion uses it when source and destination differ only in
vector-vs-non-vector shape and have identical bit width, instead
of going through memory.  For everything else (records, or shape
doesn't match) the helper still does alloca/store/ptr-cast/load.

Andy's comments, in order:
- Temporary alloca alignment is now max(srcAlign, dstAlign) from
  DataLayout instead of hardcoded.
- The alloca lives in the entry block via InsertionGuard so it
  composes with HoistAllocas regardless of pipeline order.
- isVolatile kept as UnitAttr-absence with an inline comment.
- vector<->complex now uses cir.reinterpret_cast.
- Memory path has three new .cir tests covering it.

In-body coercion (insertArgCoercion / insertReturnCoercion) folds
into the existing per-function rewriteFunctionDefinition method
introduced by the prior Direct/Ignore PR's review fixup.  It runs
ahead of the Ignore-arg drop in the same per-function inner-loop
window the pass driver already uses, so the per-function
window-of-invalidity invariant is unchanged: F's signature and
F's body coerce together; F's callers update inside the same
inner loop iteration.  The Ignore-arg drop reuses the existing
poison-stub idiom (the alloca/load-fallback that earlier drafts
used is unnecessary once the drop happens after coercion in the
same per-function window).

LowerToLLVM gets a stub for the new op: bitcast for same-shape
converted types, error-with-message for aggregates.  We don't
produce aggregates from CallConvLowering today, so the error
path is only reachable from hand-written IR; follow-up patch can
add an extract/insert lowering if needed.

Co-authored-by: Cursor <[email protected]>
---
 clang/include/clang/CIR/Dialect/IR/CIROps.td  |  48 ++++
 clang/lib/CIR/Dialect/IR/CIRDialect.cpp       |  22 ++
 .../Transforms/CallConvLoweringPass.cpp       |   2 +-
 .../TargetLowering/CIRABIRewriteContext.cpp   | 228 ++++++++++++++++--
 .../TargetLowering/CIRABIRewriteContext.h     |  15 +-
 .../CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp |  39 +++
 clang/test/CIR/IR/reinterpret-cast.cir        |  28 +++
 .../abi-lowering/coerce-int-to-record.cir     |  59 +++++
 .../abi-lowering/coerce-record-to-int.cir     |  50 ++++
 .../coerce-record-to-record-via-memory.cir    |  34 +++
 .../coerce-vector-to-complex-reinterpret.cir  |  42 ++++
 11 files changed, 537 insertions(+), 30 deletions(-)
 create mode 100644 clang/test/CIR/IR/reinterpret-cast.cir
 create mode 100644 
clang/test/CIR/Transforms/abi-lowering/coerce-int-to-record.cir
 create mode 100644 
clang/test/CIR/Transforms/abi-lowering/coerce-record-to-int.cir
 create mode 100644 
clang/test/CIR/Transforms/abi-lowering/coerce-record-to-record-via-memory.cir
 create mode 100644 
clang/test/CIR/Transforms/abi-lowering/coerce-vector-to-complex-reinterpret.cir

diff --git a/clang/include/clang/CIR/Dialect/IR/CIROps.td 
b/clang/include/clang/CIR/Dialect/IR/CIROps.td
index 67ddaa73d9184..4488b5d8c6dae 100644
--- a/clang/include/clang/CIR/Dialect/IR/CIROps.td
+++ b/clang/include/clang/CIR/Dialect/IR/CIROps.td
@@ -288,6 +288,54 @@ def CIR_CastOp : CIR_Op<"cast", [
 
 }
 
+//===----------------------------------------------------------------------===//
+// ReinterpretCastOp
+//===----------------------------------------------------------------------===//
+
+def CIR_ReinterpretCastOp : CIR_Op<"reinterpret_cast", [Pure]> {
+  let summary = "Reinterpret a value as a different same-bit-width type";
+  let description = [{
+    The `cir.reinterpret_cast` operation reinterprets the bits of its source
+    value as a different type, with no IR-level cost.  It is used by the
+    calling-convention lowering pass to coerce between same-bit-width types
+    that have an LLVM-IR-level shape mismatch but identical in-register
+    representation -- for example, between `!cir.vector<2 x !cir.float>` and
+    `!cir.complex<!cir.float>`, both of which lower to the same LLVM IR
+    representation but have distinct CIR types.
+
+    Unlike `cir.cast bitcast`, which is overloaded for pointer-to-pointer
+    bitcasts and several other use cases, `cir.reinterpret_cast` is reserved
+    for in-register value reinterpretation only.  The result type must
+    differ from the source type; otherwise the op is meaningless and the
+    folder removes it.
+
+    **Invariant** (not currently enforced by the verifier): the source and
+    destination types must have the same bit width per the module's
+    DataLayout, and they must use the same in-register lane order on the
+    target.  Producers (e.g. CallConvLowering's coerce-in-registers path)
+    are responsible for ensuring this; a follow-up patch will move the
+    bit-width check into the verifier once the design question of
+    DataLayout-aware op verifiers is resolved.
+
+    Example:
+
+    ```
+    %c = cir.reinterpret_cast %v
+       : !cir.vector<2 x !cir.float> -> !cir.complex<!cir.float>
+    ```
+  }];
+
+  let arguments = (ins CIR_AnyType:$src);
+  let results = (outs CIR_AnyType:$result);
+
+  let assemblyFormat = [{
+    $src `:` type($src) `->` type($result) attr-dict
+  }];
+
+  let hasVerifier = 1;
+  let hasFolder = 1;
+}
+
 
//===----------------------------------------------------------------------===//
 // DynamicCastOp
 
//===----------------------------------------------------------------------===//
diff --git a/clang/lib/CIR/Dialect/IR/CIRDialect.cpp 
b/clang/lib/CIR/Dialect/IR/CIRDialect.cpp
index c23a02d6f49fb..fac754137de38 100644
--- a/clang/lib/CIR/Dialect/IR/CIRDialect.cpp
+++ b/clang/lib/CIR/Dialect/IR/CIRDialect.cpp
@@ -891,6 +891,28 @@ static Value tryFoldCastChain(cir::CastOp op) {
   return {};
 }
 
+//===----------------------------------------------------------------------===//
+// ReinterpretCastOp
+//===----------------------------------------------------------------------===//
+
+LogicalResult cir::ReinterpretCastOp::verify() {
+  // The op is meaningless for identical types -- the folder is the right
+  // way to remove it -- but we accept it at the verifier level so that
+  // peephole code (e.g. pattern rewriters that round-trip values) doesn't
+  // need a type-equality guard.  Producers should still avoid emitting
+  // it for matching types.
+  //
+  // The same-bit-width invariant is documented on the op but not yet
+  // checked here; see the op description for the rationale.
+  return success();
+}
+
+OpFoldResult cir::ReinterpretCastOp::fold(FoldAdaptor adaptor) {
+  if (getSrc().getType() == getType())
+    return getSrc();
+  return {};
+}
+
 OpFoldResult cir::CastOp::fold(FoldAdaptor adaptor) {
   if (mlir::isa_and_present<cir::PoisonAttr>(adaptor.getSrc())) {
     // Propagate poison value
diff --git a/clang/lib/CIR/Dialect/Transforms/CallConvLoweringPass.cpp 
b/clang/lib/CIR/Dialect/Transforms/CallConvLoweringPass.cpp
index 838125037afd5..c00947593517e 100644
--- a/clang/lib/CIR/Dialect/Transforms/CallConvLoweringPass.cpp
+++ b/clang/lib/CIR/Dialect/Transforms/CallConvLoweringPass.cpp
@@ -137,7 +137,7 @@ void CallConvLoweringPass::runOnOperation() {
   }
 
   DataLayout dl(moduleOp);
-  CIRABIRewriteContext rewriteCtx(moduleOp);
+  CIRABIRewriteContext rewriteCtx(moduleOp, dl);
   SymbolTable symbolTable(moduleOp);
 
   // Classify every cir.func up front.  No IR mutation happens here, so
diff --git 
a/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.cpp 
b/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.cpp
index 895b477eb9054..ecd6a181df12c 100644
--- a/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.cpp
+++ b/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.cpp
@@ -62,12 +62,11 @@ LogicalResult buildNewArgTypes(ArrayRef<Type> oldArgTypes,
     Type origTy = oldArgTypes[idx];
     switch (ac.kind) {
     case ArgKind::Direct:
-      if (ac.coercedType) {
-        emitError() << "Direct with coerced type at arg " << idx
-                    << " not yet implemented in CallConvLowering";
-        return failure();
-      }
-      newArgTypes.push_back(origTy);
+      // Direct with a coerced type means the wire signature uses the
+      // coerced type; the body still expects origTy and we'll insert a
+      // reinterpret/coercion at the entry block.  Direct without a
+      // coerced type is a true pass-through.
+      newArgTypes.push_back(ac.coercedType ? ac.coercedType : origTy);
       break;
     case ArgKind::Ignore:
       break;
@@ -101,12 +100,9 @@ Type computeNewReturnType(Type origRetTy, const 
ArgClassification &retInfo,
                           function_ref<InFlightDiagnostic()> emitError) {
   switch (retInfo.kind) {
   case ArgKind::Direct:
-    if (retInfo.coercedType) {
-      emitError() << "Direct return with coerced type not yet implemented "
-                  << "in CallConvLowering";
-      return nullptr;
-    }
-    return origRetTy;
+    // Direct return with a coerced type uses the coerced type on the wire;
+    // the rewriter inserts a coercion before each cir.return.
+    return retInfo.coercedType ? retInfo.coercedType : origRetTy;
   case ArgKind::Ignore:
     return cir::VoidType::get(ctx);
   case ArgKind::Expand:
@@ -196,6 +192,157 @@ ArrayAttr updateResAttrs(MLIRContext *ctx, ArrayAttr 
existingResAttrs,
   return ArrayAttr::get(ctx, {DictionaryAttr::get(ctx, attrs)});
 }
 
+/// Coerce \p src to type \p dstTy at the current builder insertion point.
+///
+/// Three strategies, in order of preference:
+///   - If src and dst are the same type, return src unchanged and leave
+///     \p createdOps empty.
+///   - If both are non-aggregate same-bit-width values that just differ in
+///     vector-vs-scalar shape (e.g. !cir.vector<2 x !cir.float> ↔
+///     !cir.complex<!cir.float>), use cir.reinterpret_cast which is free at
+///     the IR level.
+///   - Otherwise go through memory: allocate a slot of the source type
+///     (using max(srcAlign, dstAlign) for the alloca alignment), store
+///     the source, bitcast the pointer to the destination type, load the
+///     destination type back.
+///
+/// The temporary alloca is placed at the start of the enclosing function's
+/// entry block so that it composes correctly with the HoistAllocas pass
+/// regardless of pipeline ordering.
+///
+/// Any operations the helper creates are appended to \p createdOps so the
+/// caller can pass them to replaceAllUsesExcept and avoid clobbering the
+/// store's value operand when later rewiring the source value.
+Value emitCoercion(OpBuilder &rewriter, Location loc, Type dstTy, Value src,
+                   FunctionOpInterface funcOp, const DataLayout &dl,
+                   SmallPtrSetImpl<Operation *> &createdOps) {
+  Type srcTy = src.getType();
+  if (srcTy == dstTy)
+    return src;
+
+  // Reinterpret path: same total bit width, neither side is a record, and
+  // the shapes differ only in vector-vs-non-vector.  Going through memory
+  // is wasteful for these — they have the same in-register representation.
+  bool isAggregate = isa<cir::RecordType>(srcTy) || 
isa<cir::RecordType>(dstTy);
+  bool vectorMismatch =
+      isa<cir::VectorType>(srcTy) != isa<cir::VectorType>(dstTy);
+  if (!isAggregate && vectorMismatch &&
+      dl.getTypeSizeInBits(srcTy) == dl.getTypeSizeInBits(dstTy)) {
+    auto reinterpret =
+        cir::ReinterpretCastOp::create(rewriter, loc, dstTy, src);
+    createdOps.insert(reinterpret);
+    return reinterpret;
+  }
+
+  // Memory path: alloca + store + ptr-cast + load.  The alloca goes in the
+  // entry block (Andy's review comment #3 on the original PR), with
+  // alignment = max(srcAlign, dstAlign) to satisfy both the store and the
+  // load (review comment #1).
+  uint64_t srcAlign = dl.getTypeABIAlignment(srcTy);
+  uint64_t dstAlign = dl.getTypeABIAlignment(dstTy);
+  uint64_t allocaAlign = std::max(srcAlign, dstAlign);
+
+  auto srcPtrTy = cir::PointerType::get(srcTy);
+  auto dstPtrTy = cir::PointerType::get(dstTy);
+
+  cir::AllocaOp alloca;
+  {
+    OpBuilder::InsertionGuard guard(rewriter);
+    Block &entry = funcOp->getRegion(0).front();
+    rewriter.setInsertionPointToStart(&entry);
+    alloca = cir::AllocaOp::create(rewriter, loc, srcPtrTy, srcTy,
+                                   rewriter.getStringAttr("coerce"),
+                                   rewriter.getI64IntegerAttr(allocaAlign));
+  }
+  createdOps.insert(alloca);
+
+  auto store = cir::StoreOp::create(rewriter, loc, src, alloca,
+                                    /*isVolatile=*/UnitAttr(),
+                                    /*alignment=*/IntegerAttr(),
+                                    /*sync_scope=*/cir::SyncScopeKindAttr(),
+                                    /*mem_order=*/cir::MemOrderAttr());
+  createdOps.insert(store);
+
+  auto ptrCast = cir::CastOp::create(rewriter, loc, dstPtrTy,
+                                     cir::CastKind::bitcast, alloca);
+  createdOps.insert(ptrCast);
+
+  auto load = cir::LoadOp::create(rewriter, loc, dstTy, ptrCast,
+                                  /*isDeref=*/UnitAttr(),
+                                  /*isVolatile=*/UnitAttr(),
+                                  /*alignment=*/IntegerAttr(),
+                                  /*sync_scope=*/cir::SyncScopeKindAttr(),
+                                  /*mem_order=*/cir::MemOrderAttr());
+  createdOps.insert(load);
+  return load;
+}
+
+/// Convenience overload for callers that don't need the createdOps set
+/// (e.g. call-site coercion where we don't replaceAllUsesExcept).
+Value emitCoercion(OpBuilder &rewriter, Location loc, Type dstTy, Value src,
+                   FunctionOpInterface funcOp, const DataLayout &dl) {
+  SmallPtrSet<Operation *, 4> ignored;
+  return emitCoercion(rewriter, loc, dstTy, src, funcOp, dl, ignored);
+}
+
+/// Insert coercion before each cir.return so the returned value matches the
+/// new (coerced) return type.
+void insertReturnCoercion(FunctionOpInterface funcOp, Type origRetTy,
+                          Type coercedRetTy, OpBuilder &rewriter,
+                          const DataLayout &dl) {
+  SmallVector<cir::ReturnOp> returns;
+  funcOp.walk([&](cir::ReturnOp r) { returns.push_back(r); });
+  for (cir::ReturnOp r : returns) {
+    if (r.getInput().empty())
+      continue;
+    Value origVal = r.getInput()[0];
+    if (origVal.getType() == coercedRetTy)
+      continue;
+    rewriter.setInsertionPoint(r);
+    Value coerced =
+        emitCoercion(rewriter, r.getLoc(), coercedRetTy, origVal, funcOp, dl);
+    r->setOperand(0, coerced);
+  }
+}
+
+/// For each Direct arg with a coerced type, change the block argument's type
+/// to the coerced type and insert a coercion at function entry that maps it
+/// back to the original type for body uses.
+void insertArgCoercion(FunctionOpInterface funcOp,
+                       const FunctionClassification &fc, OpBuilder &rewriter,
+                       const DataLayout &dl) {
+  Region &body = funcOp->getRegion(0);
+  if (body.empty())
+    return;
+  Block &entry = body.front();
+
+  for (auto [idx, ac] : llvm::enumerate(fc.argInfos)) {
+    if (ac.kind != ArgKind::Direct || !ac.coercedType)
+      continue;
+    if (idx >= entry.getNumArguments())
+      continue;
+
+    BlockArgument blockArg = entry.getArgument(idx);
+    Type oldArgTy = blockArg.getType();
+    Type newArgTy = ac.coercedType;
+    if (oldArgTy == newArgTy)
+      continue;
+
+    blockArg.setType(newArgTy);
+
+    rewriter.setInsertionPointToStart(&entry);
+    SmallPtrSet<Operation *, 4> coercionOps;
+    Value adapted = emitCoercion(rewriter, funcOp.getLoc(), oldArgTy, blockArg,
+                                 funcOp, dl, coercionOps);
+
+    // Replace blockArg uses with the adapted value, except inside the helper
+    // ops we just created.  This is critical: the StoreOp's value operand is
+    // blockArg, and if we naively replaceAllUses it gets swapped to adapted
+    // (now of the original type != the alloca's pointee type).
+    blockArg.replaceAllUsesExcept(adapted, coercionOps);
+  }
+}
+
 } // namespace
 
 LogicalResult CIRABIRewriteContext::rewriteFunctionDefinition(
@@ -238,6 +385,23 @@ LogicalResult 
CIRABIRewriteContext::rewriteFunctionDefinition(
   if (funcOp.isDefinition()) {
     Region &body = funcOp->getRegion(0);
     if (!body.empty()) {
+      // In-body coercion for Direct-with-coerce / Extend args: change
+      // block-arg types to the coerced types and insert a
+      // cir.reinterpret_cast at the top of the entry block that converts
+      // each coerced value back to its original type, then route existing
+      // body uses (including in-body cir.call operands) through the cast.
+      // Done before the Ignore-drop below so the entry block argument
+      // indices used here still refer to the original positions.
+      insertArgCoercion(funcOp, fc, builder, dl);
+
+      // Direct return with coerced type: insert a coercion at every
+      // cir.return so the returned value matches the (coerced) return
+      // type in the new function signature set below.
+      if (fc.returnInfo.kind == ArgKind::Direct && fc.returnInfo.coercedType &&
+          !oldResultTypes.empty() && fc.returnInfo.coercedType != origRetTy)
+        insertReturnCoercion(funcOp, origRetTy, fc.returnInfo.coercedType,
+                             builder, dl);
+
       Block &entry = body.front();
 
       // For each Ignored argument: drop the block argument and, if the
@@ -321,23 +485,19 @@ LogicalResult CIRABIRewriteContext::rewriteCallSite(
            << "indirect call not yet implemented in CallConvLowering";
 
   MLIRContext *ctx = callOp->getContext();
+  auto enclosingFunc = call->getParentOfType<FunctionOpInterface>();
 
   for (auto [idx, ac] : llvm::enumerate(fc.argInfos)) {
     switch (ac.kind) {
     case ArgKind::Direct:
-      if (ac.coercedType)
-        return call.emitOpError()
-               << "Direct with coerced type at call-site arg " << idx
-               << " not yet implemented in CallConvLowering";
-      break;
     case ArgKind::Ignore:
       break;
     case ArgKind::Expand:
       return call.emitOpError() << "Expand at call-site arg " << idx
                                 << " not yet implemented in CallConvLowering";
     case ArgKind::Extend:
-      // Extend at the call site is just an attribute change (llvm.signext /
-      // llvm.zeroext on the call's arg_attrs); no IR-level cast.
+      // Direct (with or without coercion), Ignore, Expand, and Extend are
+      // all handled below.  Extend is attribute-only at the IR level.
       break;
     case ArgKind::Indirect:
       return call.emitOpError() << "Indirect at call-site arg " << idx
@@ -345,6 +505,8 @@ LogicalResult CIRABIRewriteContext::rewriteCallSite(
     }
   }
 
+  builder.setInsertionPoint(call);
+
   SmallVector<Value> newArgs;
   ValueRange argOperands = call.getArgOperands();
   newArgs.reserve(argOperands.size());
@@ -356,7 +518,12 @@ LogicalResult CIRABIRewriteContext::rewriteCallSite(
   for (auto [idx, ac] : llvm::enumerate(fc.argInfos)) {
     if (ac.kind == ArgKind::Ignore)
       continue;
-    newArgs.push_back(argOperands[idx]);
+    Value arg = argOperands[idx];
+    if (ac.kind == ArgKind::Direct && ac.coercedType &&
+        arg.getType() != ac.coercedType)
+      arg = emitCoercion(builder, call.getLoc(), ac.coercedType, arg,
+                         enclosingFunc, dl);
+    newArgs.push_back(arg);
   }
 
   bool hasResult = call.getNumResults() > 0;
@@ -365,10 +532,11 @@ LogicalResult CIRABIRewriteContext::rewriteCallSite(
   Type callRetTy = origRetTy;
   if (fc.returnInfo.kind == ArgKind::Ignore && hasResult)
     callRetTy = cir::VoidType::get(ctx);
-  if (fc.returnInfo.kind == ArgKind::Direct && fc.returnInfo.coercedType)
-    return call.emitOpError() << "Direct return with coerced type at "
-                              << "call-site not yet implemented in "
-                              << "CallConvLowering";
+  bool returnNeedsCoercion =
+      hasResult && fc.returnInfo.kind == ArgKind::Direct &&
+      fc.returnInfo.coercedType && fc.returnInfo.coercedType != origRetTy;
+  if (returnNeedsCoercion)
+    callRetTy = fc.returnInfo.coercedType;
 
   builder.setInsertionPoint(call);
   auto newCall = cir::CallOp::create(builder, call.getLoc(),
@@ -377,6 +545,15 @@ LogicalResult CIRABIRewriteContext::rewriteCallSite(
     if (!newCall->hasAttr(attr.getName()))
       newCall->setAttr(attr.getName(), attr.getValue());
 
+  // Direct return with coercion: the new call returns the coerced type;
+  // emit a coercion back to the original type for the call's existing uses.
+  if (returnNeedsCoercion) {
+    builder.setInsertionPointAfter(newCall);
+    Value coercedBack = emitCoercion(builder, call.getLoc(), origRetTy,
+                                     newCall.getResult(), enclosingFunc, dl);
+    call.getResult().replaceAllUsesWith(coercedBack);
+  }
+
   // Layer llvm.signext / llvm.zeroext onto the new call's arg_attrs and
   // res_attrs for Extend args/return.
   bool needsArgAttrUpdate = false;
@@ -405,7 +582,8 @@ LogicalResult CIRABIRewriteContext::rewriteCallSite(
       Value poison = createIgnoredValue(builder, call.getLoc(), origRetTy);
       call.getResult().replaceAllUsesWith(poison);
     }
-  } else if (hasResult) {
+  } else if (hasResult && !returnNeedsCoercion) {
+    // returnNeedsCoercion already wired up the coerced result above.
     call.getResult().replaceAllUsesWith(newCall.getResult());
   }
 
diff --git 
a/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.h 
b/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.h
index 7a0c0b8a2f22c..038e81026784c 100644
--- a/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.h
+++ b/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.h
@@ -11,9 +11,10 @@
 // rewrites a cir.func signature, the function body, and call sites to match
 // the ABI-lowered shape.
 //
-// This file currently handles only Direct (pass-through) and Ignore.  Other
-// ArgKind handlers (Extend, Direct-with-coercion, Indirect, Expand) are
-// added by subsequent PRs in the calling-convention-lowering split series.
+// This file currently handles Direct (pass-through and coerce-in-registers),
+// Extend, and Ignore.  The remaining ArgKind handlers (Indirect, Expand)
+// are added by subsequent PRs in the calling-convention-lowering split
+// series.
 //
 
//===----------------------------------------------------------------------===//
 
@@ -22,6 +23,7 @@
 
 #include "mlir/ABI/ABIRewriteContext.h"
 #include "mlir/IR/BuiltinOps.h"
+#include "mlir/Interfaces/DataLayoutInterfaces.h"
 #include "clang/CIR/Dialect/IR/CIRDialect.h"
 
 namespace cir {
@@ -31,9 +33,13 @@ namespace cir {
 /// The driver pass (CallConvLoweringPass) computes a FunctionClassification
 /// for each cir.func / cir.call and dispatches to this class to perform the
 /// actual IR rewriting using cir dialect operations.
+///
+/// Holds a reference to the module's DataLayout for coercion alignment
+/// queries.  The DataLayout must outlive the rewrite context.
 class CIRABIRewriteContext : public mlir::abi::ABIRewriteContext {
 public:
-  explicit CIRABIRewriteContext(mlir::ModuleOp module) : module(module) {}
+  CIRABIRewriteContext(mlir::ModuleOp module, const mlir::DataLayout &dl)
+      : module(module), dl(dl) {}
 
   mlir::LogicalResult
   rewriteFunctionDefinition(mlir::FunctionOpInterface funcOp,
@@ -49,6 +55,7 @@ class CIRABIRewriteContext : public 
mlir::abi::ABIRewriteContext {
 
 private:
   mlir::ModuleOp module;
+  const mlir::DataLayout &dl;
 };
 
 } // namespace cir
diff --git a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp 
b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
index c4e98e299dfc1..0b11cc4547178 100644
--- a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+++ b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
@@ -1659,6 +1659,45 @@ mlir::LogicalResult 
CIRToLLVMReturnOpLowering::matchAndRewrite(
   return mlir::LogicalResult::success();
 }
 
+mlir::LogicalResult CIRToLLVMReinterpretCastOpLowering::matchAndRewrite(
+    cir::ReinterpretCastOp op, OpAdaptor adaptor,
+    mlir::ConversionPatternRewriter &rewriter) const {
+  // After type conversion, source and destination LLVM types may be:
+  //   (a) Identical: trivially replace uses with the source value (the
+  //       op was a CIR-level type rename only; LLVM sees no change).
+  //   (b) Same scalar / vector category, same bit width: emit
+  //       LLVM::BitcastOp.
+  //   (c) Aggregate vs scalar / aggregate vs vector: LLVM::BitcastOp
+  //       does not allow aggregate types.  We currently emit an error
+  //       directing the producer to go through memory.  A future patch
+  //       will add an extract/insert lowering for the aggregate case so
+  //       the LLVM IR avoids the memory roundtrip too.
+  mlir::Type llvmDstTy = getTypeConverter()->convertType(op.getType());
+  mlir::Value llvmSrc = adaptor.getSrc();
+  mlir::Type llvmSrcTy = llvmSrc.getType();
+
+  if (llvmSrcTy == llvmDstTy) {
+    rewriter.replaceOp(op, llvmSrc);
+    return mlir::success();
+  }
+
+  bool srcIsAggregate =
+      mlir::isa<mlir::LLVM::LLVMStructType, mlir::LLVM::LLVMArrayType>(
+          llvmSrcTy);
+  bool dstIsAggregate =
+      mlir::isa<mlir::LLVM::LLVMStructType, mlir::LLVM::LLVMArrayType>(
+          llvmDstTy);
+  if (srcIsAggregate || dstIsAggregate)
+    return op.emitOpError()
+           << "lowering cir.reinterpret_cast to LLVM with aggregate type "
+           << "not yet implemented; producer should fall back to memory "
+           << "coercion until a follow-up patch adds extract/insert "
+           << "lowering";
+
+  rewriter.replaceOpWithNewOp<mlir::LLVM::BitcastOp>(op, llvmDstTy, llvmSrc);
+  return mlir::success();
+}
+
 mlir::LogicalResult CIRToLLVMRotateOpLowering::matchAndRewrite(
     cir::RotateOp op, OpAdaptor adaptor,
     mlir::ConversionPatternRewriter &rewriter) const {
diff --git a/clang/test/CIR/IR/reinterpret-cast.cir 
b/clang/test/CIR/IR/reinterpret-cast.cir
new file mode 100644
index 0000000000000..94742e15cda42
--- /dev/null
+++ b/clang/test/CIR/IR/reinterpret-cast.cir
@@ -0,0 +1,28 @@
+// RUN: cir-opt %s --verify-roundtrip | FileCheck %s
+
+!s32i = !cir.int<s, 32>
+
+module {
+  // Vector ↔ complex same-bit-width reinterpret (the canonical use case
+  // from cir-call-conv-lowering's coerce-in-registers path).
+  cir.func @vec_to_complex(%v : !cir.vector<2 x !cir.float>)
+      -> !cir.complex<!cir.float> {
+    %c = cir.reinterpret_cast %v
+       : !cir.vector<2 x !cir.float> -> !cir.complex<!cir.float>
+    cir.return %c : !cir.complex<!cir.float>
+  }
+
+  // Reverse direction.
+  cir.func @complex_to_vec(%c : !cir.complex<!cir.float>)
+      -> !cir.vector<2 x !cir.float> {
+    %v = cir.reinterpret_cast %c
+       : !cir.complex<!cir.float> -> !cir.vector<2 x !cir.float>
+    cir.return %v : !cir.vector<2 x !cir.float>
+  }
+}
+
+// CHECK:      cir.func{{.*}} @vec_to_complex
+// CHECK:        cir.reinterpret_cast %{{.*}} : !cir.vector<2 x !cir.float> -> 
!cir.complex<!cir.float>
+
+// CHECK:      cir.func{{.*}} @complex_to_vec
+// CHECK:        cir.reinterpret_cast %{{.*}} : !cir.complex<!cir.float> -> 
!cir.vector<2 x !cir.float>
diff --git a/clang/test/CIR/Transforms/abi-lowering/coerce-int-to-record.cir 
b/clang/test/CIR/Transforms/abi-lowering/coerce-int-to-record.cir
new file mode 100644
index 0000000000000..f90427bf68b4c
--- /dev/null
+++ b/clang/test/CIR/Transforms/abi-lowering/coerce-int-to-record.cir
@@ -0,0 +1,59 @@
+// Direct return with coerced type going from a small record to a same-bit-
+// width integer.  Mirror of coerce-record-to-int.cir but exercising the
+// return-side coercion code path: every cir.return gets the original
+// record value coerced to the integer type before being returned.
+// RUN: cir-opt %s -cir-call-conv-lowering="classification-attr=test_classify" 
\
+// RUN:   | FileCheck %s
+
+!s32i = !cir.int<s, 32>
+!s64i = !cir.int<s, 64>
+!rec_Pair = !cir.record<struct "Pair" {!s32i, !s32i}>
+
+#coerce_pair_return_to_i64 = {
+  return = { kind = "direct", coerced_type = !s64i },
+  args   = [ ]
+}
+
+#all_direct_no_args = {
+  return = { kind = "direct" },
+  args   = [ ]
+}
+
+module attributes {
+  dlti.dl_spec = #dlti.dl_spec<
+    #dlti.dl_entry<i32, dense<32>: vector<2xi64>>,
+    #dlti.dl_entry<i64, dense<64>: vector<2xi64>>>
+} {
+
+  cir.func @returns_pair() -> !rec_Pair
+      attributes { test_classify = #coerce_pair_return_to_i64 } {
+    %0 = cir.const #cir.zero : !rec_Pair
+    cir.return %0 : !rec_Pair
+  }
+
+  // Signature changes to !s64i return; the cir.return's record operand
+  // gets coerced via memory roundtrip before being returned.  The alloca
+  // is hoisted to the entry-block start (Andy's review comment #3 from the
+  // original PR) so it sits ahead of the const that produces the value.
+  // CHECK:      cir.func{{.*}} @returns_pair() -> !s64i
+  // CHECK:        %[[SLOT:.*]] = cir.alloca !rec_Pair, !cir.ptr<!rec_Pair>, 
["coerce"]
+  // CHECK:        %[[VAL:.*]] = cir.const #cir.zero : !rec_Pair
+  // CHECK:        cir.store %[[VAL]], %[[SLOT]] : !rec_Pair, 
!cir.ptr<!rec_Pair>
+  // CHECK:        %[[CAST:.*]] = cir.cast bitcast %[[SLOT]] : 
!cir.ptr<!rec_Pair> -> !cir.ptr<!s64i>
+  // CHECK:        %[[COERCED:.*]] = cir.load %[[CAST]] : !cir.ptr<!s64i>, 
!s64i
+  // CHECK:        cir.return %[[COERCED]] : !s64i
+
+  cir.func @caller() -> !rec_Pair
+      attributes { test_classify = #coerce_pair_return_to_i64 } {
+    %0 = cir.call @returns_pair() : () -> !rec_Pair
+    cir.return %0 : !rec_Pair
+  }
+
+  // At the call site the lowered call returns !s64i; the rewriter coerces
+  // it back to !rec_Pair for downstream uses (the caller's own return
+  // also needs the coerce-back-then-coerce-forward chain since caller's
+  // return is also Direct-with-coerce).
+  // CHECK:      cir.func{{.*}} @caller() -> !s64i
+  // CHECK:        %{{.*}} = cir.call @returns_pair() : () -> !s64i
+
+}
diff --git a/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-int.cir 
b/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-int.cir
new file mode 100644
index 0000000000000..f31f09181710e
--- /dev/null
+++ b/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-int.cir
@@ -0,0 +1,50 @@
+// Direct with coerced type going from a small record to a same-bit-width
+// integer.  The shapes don't match (record vs scalar) so the rewriter
+// emits a memory roundtrip: alloca in the entry block + store + ptr-cast +
+// load.
+// RUN: cir-opt %s -cir-call-conv-lowering="classification-attr=test_classify" 
\
+// RUN:   | FileCheck %s
+
+!s32i = !cir.int<s, 32>
+!s64i = !cir.int<s, 64>
+!rec_Pair = !cir.record<struct "Pair" {!s32i, !s32i}>
+
+#coerce_pair_to_i64 = {
+  return = { kind = "direct" },
+  args   = [ { kind = "direct", coerced_type = !s64i } ]
+}
+
+module attributes {
+  dlti.dl_spec = #dlti.dl_spec<
+    #dlti.dl_entry<i32, dense<32>: vector<2xi64>>,
+    #dlti.dl_entry<i64, dense<64>: vector<2xi64>>>
+} {
+
+  cir.func @takes_pair(%arg0: !rec_Pair)
+      attributes { test_classify = #coerce_pair_to_i64 } {
+    cir.return
+  }
+
+  // Signature changes to !s64i; entry block grows an alloca + store + cast
+  // + load chain that recovers the original !rec_Pair value.  The alloca
+  // lands at the very start of the entry block so this composes correctly
+  // with cir-hoist-allocas regardless of pipeline ordering.
+  // CHECK:      cir.func{{.*}} @takes_pair(%[[ARG:.*]]: !s64i)
+  // CHECK:        %[[SLOT:.*]] = cir.alloca !s64i, !cir.ptr<!s64i>, ["coerce"]
+  // CHECK:        cir.store %[[ARG]], %[[SLOT]] : !s64i, !cir.ptr<!s64i>
+  // CHECK:        %[[CAST:.*]] = cir.cast bitcast %[[SLOT]] : !cir.ptr<!s64i> 
-> !cir.ptr<!rec_Pair>
+  // CHECK:        %{{.*}} = cir.load %[[CAST]] : !cir.ptr<!rec_Pair>, 
!rec_Pair
+
+  cir.func @caller(%arg0: !rec_Pair)
+      attributes { test_classify = #coerce_pair_to_i64 } {
+    cir.call @takes_pair(%arg0) : (!rec_Pair) -> ()
+    cir.return
+  }
+
+  // At the call site, the original !rec_Pair gets coerced to !s64i via the
+  // same memory roundtrip before being passed.  Caller's own arg coercion
+  // chain runs first (it shares the pattern), then the call.
+  // CHECK:      cir.func{{.*}} @caller(%[[ARG:.*]]: !s64i)
+  // CHECK:        cir.call @takes_pair(%{{.*}}) : (!s64i) -> ()
+
+}
diff --git 
a/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-record-via-memory.cir 
b/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-record-via-memory.cir
new file mode 100644
index 0000000000000..1669bf1232d28
--- /dev/null
+++ 
b/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-record-via-memory.cir
@@ -0,0 +1,34 @@
+// Direct with a coerced type that's a different record (record-to-record):
+// neither side is a vector and at least one is a record, so the rewriter
+// uses the memory-roundtrip path even though both types are aggregates.
+// RUN: cir-opt %s -cir-call-conv-lowering="classification-attr=test_classify" 
\
+// RUN:   | FileCheck %s
+
+!s32i = !cir.int<s, 32>
+!s64i = !cir.int<s, 64>
+!rec_Pair  = !cir.record<struct "Pair"  {!s32i, !s32i}>
+!rec_Single = !cir.record<struct "Single" {!s64i}>
+
+#coerce_pair_to_single = {
+  return = { kind = "direct" },
+  args   = [ { kind = "direct", coerced_type = !rec_Single } ]
+}
+
+module attributes {
+  dlti.dl_spec = #dlti.dl_spec<
+    #dlti.dl_entry<i32, dense<32>: vector<2xi64>>,
+    #dlti.dl_entry<i64, dense<64>: vector<2xi64>>>
+} {
+
+  cir.func @takes_pair(%arg0: !rec_Pair)
+      attributes { test_classify = #coerce_pair_to_single } {
+    cir.return
+  }
+
+  // CHECK: cir.func{{.*}} @takes_pair(%[[ARG:.*]]: !rec_Single)
+  // CHECK:   %[[SLOT:.*]] = cir.alloca !rec_Single, !cir.ptr<!rec_Single>, 
["coerce"]
+  // CHECK:   cir.store %[[ARG]], %[[SLOT]] : !rec_Single, 
!cir.ptr<!rec_Single>
+  // CHECK:   %[[CAST:.*]] = cir.cast bitcast %[[SLOT]] : 
!cir.ptr<!rec_Single> -> !cir.ptr<!rec_Pair>
+  // CHECK:   %{{.*}} = cir.load %[[CAST]] : !cir.ptr<!rec_Pair>, !rec_Pair
+
+}
diff --git 
a/clang/test/CIR/Transforms/abi-lowering/coerce-vector-to-complex-reinterpret.cir
 
b/clang/test/CIR/Transforms/abi-lowering/coerce-vector-to-complex-reinterpret.cir
new file mode 100644
index 0000000000000..ceb1f9e364466
--- /dev/null
+++ 
b/clang/test/CIR/Transforms/abi-lowering/coerce-vector-to-complex-reinterpret.cir
@@ -0,0 +1,42 @@
+// Direct with coerced type that differs from the original only in
+// vector-vs-non-vector shape (same total bit width, neither side a record):
+// the rewriter emits cir.reinterpret_cast instead of going through memory.
+// RUN: cir-opt %s -cir-call-conv-lowering="classification-attr=test_classify" 
\
+// RUN:   | FileCheck %s
+
+#coerce_complex_to_vec2 = {
+  return = { kind = "direct" },
+  args   = [ { kind = "direct",
+               coerced_type = !cir.vector<2 x !cir.float> } ]
+}
+
+module attributes {
+  dlti.dl_spec = #dlti.dl_spec<
+    #dlti.dl_entry<f32, dense<32>: vector<2xi64>>>
+} {
+
+  cir.func @takes_complex(%arg0: !cir.complex<!cir.float>)
+      attributes { test_classify = #coerce_complex_to_vec2 } {
+    cir.return
+  }
+
+  // The signature changes to the coerced (vector) type; the body still
+  // expects the complex, so a reinterpret_cast lands at function entry to
+  // adapt the new block argument back to the original type.
+  // CHECK: cir.func{{.*}} @takes_complex(%[[ARG:.*]]: !cir.vector<2 x 
!cir.float>)
+  // CHECK:   %{{.*}} = cir.reinterpret_cast %[[ARG]] : !cir.vector<2 x 
!cir.float> -> !cir.complex<!cir.float>
+
+  cir.func @caller(%arg0: !cir.complex<!cir.float>)
+      attributes { test_classify = #coerce_complex_to_vec2 } {
+    cir.call @takes_complex(%arg0) : (!cir.complex<!cir.float>) -> ()
+    cir.return
+  }
+
+  // At the call site the rewriter coerces the original (complex) value to
+  // the vector type before passing it through.
+  // CHECK: cir.func{{.*}} @caller(%[[ARG:.*]]: !cir.vector<2 x !cir.float>)
+  // CHECK:   %[[COMPLEX:.*]] = cir.reinterpret_cast %[[ARG]] : !cir.vector<2 
x !cir.float> -> !cir.complex<!cir.float>
+  // CHECK:   %[[COERCED:.*]] = cir.reinterpret_cast %[[COMPLEX]] : 
!cir.complex<!cir.float> -> !cir.vector<2 x !cir.float>
+  // CHECK:   cir.call @takes_complex(%[[COERCED]]) : (!cir.vector<2 x 
!cir.float>) -> ()
+
+}

_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CIR] Add Direct coerce-in-registers + cir.reinterpret_cast op (PR #195879)

Reply via email to