https://github.com/kumarak created 
https://github.com/llvm/llvm-project/pull/204360

This PR adds 32-bit ARM support to ClangIR. Before this, CIR assumed 64-bit 
pointers and the generic Itanium C++ ABI, so 32-bit ARM targets crashed. 

The changes are built on two foundational fixes that are also useful on their 
own (nvptx/spirv32 and other 32-bit targets); they are sent up separately, and 
this work stacks on top of them (#203942 and #204185)

PR changes include:
  - CXXABI dispatch: handle GenericARM in the transform-pass CXXABI dispatch, 
selecting the ARM method-pointer ABI and matching non-virtual member-pointer 
layout.
  - Array cookie: use the two-word `{element-size, element-count}` cookie for 
`new[]`/`delete[]` of types with a non-trivial
  destructor.
  - ABI matching(constructor/destructor): return this from constructors and 
non-deleting destructors, and mark the this argument with the returned 
parameter attribute, mirroring CodeGenModule::SetFunctionAttributes.
  - Target-dependent widths: drive the `size_t` width of 
`__cxa_allocate_exception` and the `llvm.memcpy` length emitted for `cir.copy` 
from the data layout instead of hardcoding `i64`, so 32-bit targets emit `i32`.
  - NEON lane handling: implement the `vget_lane`/`vgetq_lane family` as a 
vector element extraction (they lower to `__builtin_neon_*` on ARM, unlike 
AArch64); other ARM builtins still report NYI rather than miscompiling.
  - Added new unit tests covering the supported 32-bit ARM ABI features and 
associated code generation behavior.

>From 81e726b77b53e89ff209a70deccb24a32dff5b57 Mon Sep 17 00:00:00 2001
From: AkshayK <[email protected]>
Date: Wed, 17 Jun 2026 10:16:06 -0400
Subject: [PATCH 1/3] [CIR] Use the AST result type for sizeof/alignof
 constants

Port of llvm/llvm-project#203942. Form the IntAttr for sizeof / __alignof /
__builtin_vectorelements from size_t (cgm.sizeTy) instead of a hardcoded
64-bit type, so the constant width matches the APInt returned by
EvaluateKnownConstInt on targets where size_t is narrower than 64 bits.
---
 clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp    |  8 ++--
 .../unary-expr-or-type-trait-32bit.cpp        | 38 +++++++++++++++++++
 2 files changed, 43 insertions(+), 3 deletions(-)
 create mode 100644 clang/test/CIR/CodeGen/unary-expr-or-type-trait-32bit.cpp

diff --git a/clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp 
b/clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp
index c7e19c38dbba1..5e8bb9df83ab3 100644
--- a/clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp
@@ -2773,16 +2773,18 @@ mlir::Value 
ScalarExprEmitter::VisitUnaryExprOrTypeTraitExpr(
           e->getSourceRange(),
           "VisitUnaryExprOrTypeTraitExpr: sizeOf scalable vector");
       return builder.getConstant(
-          loc, cir::IntAttr::get(cgf.cgm.uInt64Ty,
+          loc, cir::IntAttr::get(cgf.cgm.sizeTy,
                                  e->EvaluateKnownConstInt(cgf.getContext())));
     }
 
     return builder.getConstant(
-        loc, cir::IntAttr::get(cgf.cgm.uInt64Ty, vecTy.getSize()));
+        loc, cir::IntAttr::get(cgf.cgm.sizeTy, vecTy.getSize()));
   }
 
+  // The result type is size_t (target-dependent width); use it so the IntAttr
+  // width matches the APInt from EvaluateKnownConstInt.
   return builder.getConstant(
-      loc, cir::IntAttr::get(cgf.cgm.uInt64Ty,
+      loc, cir::IntAttr::get(cgf.cgm.sizeTy,
                              e->EvaluateKnownConstInt(cgf.getContext())));
 }
 
diff --git a/clang/test/CIR/CodeGen/unary-expr-or-type-trait-32bit.cpp 
b/clang/test/CIR/CodeGen/unary-expr-or-type-trait-32bit.cpp
new file mode 100644
index 0000000000000..e24278c09e622
--- /dev/null
+++ b/clang/test/CIR/CodeGen/unary-expr-or-type-trait-32bit.cpp
@@ -0,0 +1,38 @@
+// RUN: %clang_cc1 -std=c++20 -triple i686-unknown-linux-gnu -fclangir 
-emit-cir %s -o %t.cir
+// RUN: FileCheck --input-file=%t.cir %s --check-prefix=CIR
+// RUN: %clang_cc1 -std=c++20 -triple i686-unknown-linux-gnu -fclangir 
-emit-llvm %s -o %t-cir.ll
+// RUN: FileCheck --input-file=%t-cir.ll %s --check-prefix=LLVM
+// RUN: %clang_cc1 -std=c++20 -triple i686-unknown-linux-gnu -emit-llvm %s -o 
%t.ll
+// RUN: FileCheck --input-file=%t.ll %s --check-prefix=LLVM
+
+// The result of sizeof/alignof/__builtin_vectorelements is size_t, which is
+// 32 bits wide on this target. The emitted constants must use that width
+// rather than a hardcoded 64-bit type.
+
+using size_t = decltype(sizeof(int));
+
+size_t size_of_int() { return sizeof(int); }
+
+// CIR-LABEL: cir.func {{.*}}@_Z11size_of_intv() -> {{.*}}!u32i
+// CIR:         cir.const #cir.int<4> : !u32i
+
+// LLVM-LABEL: define {{.*}}i32 @_Z11size_of_intv()
+// LLVM:         {{.*}}i32 4
+
+size_t align_of_double() { return alignof(double); }
+
+// CIR-LABEL: cir.func {{.*}}@_Z15align_of_doublev() -> {{.*}}!u32i
+// CIR:         cir.const #cir.int<4> : !u32i
+
+// LLVM-LABEL: define {{.*}}i32 @_Z15align_of_doublev()
+// LLVM:         {{.*}}i32 4
+
+typedef int vi4 __attribute__((vector_size(16)));
+
+size_t vector_elements(vi4 v) { return __builtin_vectorelements(v); }
+
+// CIR-LABEL: cir.func {{.*}}@_Z15vector_elementsDv4_i({{.*}}) -> {{.*}}!u32i
+// CIR:         cir.const #cir.int<4> : !u32i
+
+// LLVM-LABEL: define {{.*}}i32 @_Z15vector_elementsDv4_i
+// LLVM:         {{.*}}i32 4

>From 5b7cee08a0d669fe8e894e3226f24ddb00f1743d Mon Sep 17 00:00:00 2001
From: AkshayK <[email protected]>
Date: Wed, 17 Jun 2026 10:16:06 -0400
Subject: [PATCH 2/3] [CIR] Pointer and vptr width from a CIR-native
 data-layout entry

Port of llvm/llvm-project#204185. Attach a cir.ptr data-layout entry at module
setup storing {size, abi-align} read from the target DataLayout; PointerType and
VPtrType read their width from it (falling back to 64/8 when absent), and the
CIR->LLVM lowering strips the entry. Fixes record-layout crashes on targets with
32-bit pointers (e.g. nvptx, spirv32).
---
 clang/lib/CIR/CodeGen/CIRGenerator.cpp        | 25 ++++++++-
 clang/lib/CIR/Dialect/IR/CIRTypes.cpp         | 56 +++++++++++++++++--
 .../CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp | 17 ++++++
 .../test/CIR/CodeGen/pointer-width-32bit.cpp  | 44 +++++++++++++++
 4 files changed, 135 insertions(+), 7 deletions(-)
 create mode 100644 clang/test/CIR/CodeGen/pointer-width-32bit.cpp

diff --git a/clang/lib/CIR/CodeGen/CIRGenerator.cpp 
b/clang/lib/CIR/CodeGen/CIRGenerator.cpp
index d4fcbb6e42f3e..61efaebad9b82 100644
--- a/clang/lib/CIR/CodeGen/CIRGenerator.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenerator.cpp
@@ -21,6 +21,7 @@
 #include "clang/AST/DeclGroup.h"
 #include "clang/CIR/CIRGenerator.h"
 #include "clang/CIR/InitAllDialects.h"
+#include "clang/CIR/MissingFeatures.h"
 #include "llvm/IR/DataLayout.h"
 
 using namespace cir;
@@ -42,7 +43,29 @@ static void setMLIRDataLayout(mlir::ModuleOp &mod, const 
llvm::DataLayout &dl) {
   mlir::MLIRContext *mlirContext = mod.getContext();
   mlir::DataLayoutSpecInterface dlSpec =
       mlir::translateDataLayout(dl, mlirContext);
-  mod->setAttr(mlir::DLTIDialect::kDataLayoutAttrName, dlSpec);
+
+  // Add a CIR-native pointer data-layout entry so cir.ptr / cir.vptr size and
+  // alignment are driven by the data layout rather than hardcoded.
+  // The value stores {size-in-bits, abi-align-in-bits} keyed on cir.ptr.
+  //
+  // TODO(cir): Only the default address space is recorded and
+  // address-space-dependent pointer sizes are not modeled yet. Emit
+  // per-address-space entries.
+  assert(!cir::MissingFeatures::dataLayoutPtrHandlingBasedOnLangAS());
+  constexpr unsigned kBitsInByte = 8;
+  unsigned ptrSizeBits = dl.getPointerSizeInBits(/*AS=*/0);
+  unsigned ptrAlignBits =
+      dl.getPointerABIAlignment(/*AS=*/0).value() * kBitsInByte;
+  auto ptrKey = cir::PointerType::get(cir::VoidType::get(mlirContext));
+  auto ptrVal = mlir::DenseI32ArrayAttr::get(
+      mlirContext,
+      {static_cast<int32_t>(ptrSizeBits), static_cast<int32_t>(ptrAlignBits)});
+  llvm::SmallVector<mlir::DataLayoutEntryInterface> entries(
+      dlSpec.getEntries().begin(), dlSpec.getEntries().end());
+  entries.push_back(mlir::DataLayoutEntryAttr::get(ptrKey, ptrVal));
+
+  mod->setAttr(mlir::DLTIDialect::kDataLayoutAttrName,
+               mlir::DataLayoutSpecAttr::get(mlirContext, entries));
 }
 
 void CIRGenerator::Initialize(ASTContext &astContext) {
diff --git a/clang/lib/CIR/Dialect/IR/CIRTypes.cpp 
b/clang/lib/CIR/Dialect/IR/CIRTypes.cpp
index 9c2a40e3681aa..04da6c85bbbb1 100644
--- a/clang/lib/CIR/Dialect/IR/CIRTypes.cpp
+++ b/clang/lib/CIR/Dialect/IR/CIRTypes.cpp
@@ -580,20 +580,62 @@ void RecordType::removeABIConversionNamePrefix() {
 // Data Layout information for types
 
//===----------------------------------------------------------------------===//
 
+// A CIR-native pointer data-layout entry stores {size-in-bits,
+// abi-align-in-bits} as a dense i32 array keyed on a cir.ptr type (see
+// setMLIRDataLayout in CIRGenerator).
+namespace {
+constexpr static uint64_t kBitsInByte = 8;
+
+// Defaults used only when the module carries no cir.ptr data-layout entry
+// (e.g. CIR parsed from text without a data layout). These mirror the MLIR 
LLVM
+// dialect's pointer defaults.
+constexpr static uint64_t kDefaultPointerSizeBits = 64;
+constexpr static uint64_t kDefaultPointerAlignment = 8;
+
+enum class CIRPtrDLPos { Size = 0, AbiAlign = 1 };
+
+// Returns the requested field of the cir.ptr data-layout entry.
+std::optional<uint64_t> getPointerSpecValue(mlir::DataLayoutEntryListRef 
params,
+                                            CIRPtrDLPos pos) {
+  for (mlir::DataLayoutEntryInterface entry : params) {
+    if (!entry.isTypeEntry())
+      continue;
+    auto spec = mlir::dyn_cast<mlir::DenseI32ArrayAttr>(entry.getValue());
+    assert(spec && spec.size() == 2 &&
+           "malformed cir.ptr data layout entry: expected a pair of i32 "
+           "{size-in-bits, abi-align-in-bits}");
+    return static_cast<uint64_t>(spec[static_cast<int>(pos)]);
+  }
+  return std::nullopt;
+}
+} // namespace
+
 llvm::TypeSize
 PointerType::getTypeSizeInBits(const ::mlir::DataLayout &dataLayout,
                                ::mlir::DataLayoutEntryListRef params) const {
+  // The pointer width comes from the CIR-native data-layout entry keyed on
+  // cir.ptr, which records the width for the default address space; fall back
+  // to 64 bits if the module carries no such entry.
   // FIXME: improve this in face of address spaces
   assert(!cir::MissingFeatures::dataLayoutPtrHandlingBasedOnLangAS());
-  return llvm::TypeSize::getFixed(64);
+  if (std::optional<uint64_t> size =
+          getPointerSpecValue(params, CIRPtrDLPos::Size))
+    return llvm::TypeSize::getFixed(*size);
+  return llvm::TypeSize::getFixed(kDefaultPointerSizeBits);
 }
 
 uint64_t
 PointerType::getABIAlignment(const ::mlir::DataLayout &dataLayout,
                              ::mlir::DataLayoutEntryListRef params) const {
+  // As with the size, the alignment is taken from the default-address-space
+  // cir.ptr data-layout entry. Address-space-dependent alignments are not yet
+  // modeled.
   // FIXME: improve this in face of address spaces
   assert(!cir::MissingFeatures::dataLayoutPtrHandlingBasedOnLangAS());
-  return 8;
+  if (std::optional<uint64_t> align =
+          getPointerSpecValue(params, CIRPtrDLPos::AbiAlign))
+    return *align / kBitsInByte;
+  return kDefaultPointerAlignment;
 }
 
 llvm::TypeSize
@@ -1112,14 +1154,16 @@ DataMemberType::getABIAlignment(const 
::mlir::DataLayout &dataLayout,
 llvm::TypeSize
 VPtrType::getTypeSizeInBits(const mlir::DataLayout &dataLayout,
                             mlir::DataLayoutEntryListRef params) const {
-  // FIXME: consider size differences under different ABIs
-  return llvm::TypeSize::getFixed(64);
+  // The vtable pointer is an ordinary data pointer; route the query through a
+  // cir.ptr so it picks up the same data-layout-driven width.
+  return dataLayout.getTypeSizeInBits(
+      cir::PointerType::get(cir::VoidType::get(getContext())));
 }
 
 uint64_t VPtrType::getABIAlignment(const mlir::DataLayout &dataLayout,
                                    mlir::DataLayoutEntryListRef params) const {
-  // FIXME: consider alignment differences under different ABIs
-  return 8;
+  return dataLayout.getTypeABIAlignment(
+      cir::PointerType::get(cir::VoidType::get(getContext())));
 }
 
 
//===----------------------------------------------------------------------===//
diff --git a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp 
b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
index c844375a000e0..1cc7f986de1c9 100644
--- a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+++ b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
@@ -3703,6 +3703,23 @@ void ConvertCIRToLLVMPass::runOnOperation() {
   if (failed(applyPartialConversion(ops, target, std::move(patterns))))
     signalPassFailure();
 
+  // The CIR-native pointer data-layout entry (keyed on cir.ptr) drives pointer
+  // widths during CIR codegen and lowering, but cir.ptr has no meaning once 
the
+  // module is translated to LLVM IR. Drop it so the resulting data layout only
+  // references LLVM types.
+  if (auto dlSpec = mlir::dyn_cast_or_null<mlir::DataLayoutSpecAttr>(
+          module->getAttr(mlir::DLTIDialect::kDataLayoutAttrName))) {
+    llvm::SmallVector<mlir::DataLayoutEntryInterface> kept;
+    for (mlir::DataLayoutEntryInterface entry : dlSpec.getEntries()) {
+      if (entry.isTypeEntry() &&
+          mlir::isa<cir::PointerType>(mlir::cast<mlir::Type>(entry.getKey())))
+        continue;
+      kept.push_back(entry);
+    }
+    module->setAttr(mlir::DLTIDialect::kDataLayoutAttrName,
+                    mlir::DataLayoutSpecAttr::get(module.getContext(), kept));
+  }
+
   // Emit the llvm.global_ctors array.
   buildCtorDtorList(module, cir::CIRDialect::getGlobalCtorsAttrName(),
                     "llvm.global_ctors", [](mlir::Attribute attr) {
diff --git a/clang/test/CIR/CodeGen/pointer-width-32bit.cpp 
b/clang/test/CIR/CodeGen/pointer-width-32bit.cpp
new file mode 100644
index 0000000000000..15f3ec92d7e16
--- /dev/null
+++ b/clang/test/CIR/CodeGen/pointer-width-32bit.cpp
@@ -0,0 +1,44 @@
+// RUN: %clang_cc1 -std=c++20 -triple nvptx-nvidia-cuda -fclangir -emit-cir %s 
-o %t.cir
+// RUN: FileCheck --check-prefix=CIR --input-file=%t.cir %s
+// RUN: %clang_cc1 -std=c++20 -triple nvptx-nvidia-cuda -fclangir -emit-llvm 
%s -o %t-cir.ll
+// RUN: FileCheck --check-prefix=LLVM --input-file=%t-cir.ll %s
+// RUN: %clang_cc1 -std=c++20 -triple nvptx-nvidia-cuda -emit-llvm %s -o %t.ll
+// RUN: FileCheck --check-prefix=OGCG --input-file=%t.ll %s
+
+// On a target with 32-bit pointers (e.g. nvptx) both a data pointer (!cir.ptr)
+// and the vtable pointer (!cir.vptr) are 4 bytes wide. The pointer width is
+// carried by a CIR-native data-layout entry keyed on cir.ptr, so the field
+// following a pointer lands at the AST-mandated offset. Sizing pointers as a
+// hardcoded 64 bits previously tripped the record layout builder 
(insertPadding:
+// assertion `offset >= size`) on every record containing a pointer.
+
+struct S {
+  int *p;
+  int x;
+};
+
+S s;
+
+class A {
+public:
+  virtual void f();
+  int x;
+};
+
+void A::f() {}
+
+// The module carries a CIR-native pointer data-layout entry ({size, abi-align}
+// in bits) that drives both cir.ptr and cir.vptr widths. The 4-byte pointer is
+// immediately followed by 'x' at offset 4 with no padding, and each record is
+// 4-byte aligned.
+// CIR-DAG: !rec_S = !cir.struct<"S" {!cir.ptr<!s32i>, !s32i}>
+// CIR-DAG: !rec_A = !cir.struct<class "A" {!cir.vptr, !s32i}>
+// CIR-DAG: !cir.ptr<!cir.void> = array<i32: 32, 32>
+// CIR: cir.global external @s = #cir.zero : !rec_S {alignment = 4 : i64}
+// CIR: cir.global{{.*}}@_ZTV1A = #cir.vtable<{{.*}}{alignment = 4 : i64}
+
+// LLVM: @s = global %struct.S zeroinitializer, align 4
+// LLVM: @_ZTV1A = global { [3 x ptr] } {{.*}}, align 4
+
+// OGCG: @s = global %struct.S zeroinitializer, align 4
+// OGCG: @_ZTV1A = {{.*}}constant { [3 x ptr] } {{.*}}, align 4

>From 99f4d39d7570afc3da8febd85a058fa82cd0b387 Mon Sep 17 00:00:00 2001
From: AkshayK <[email protected]>
Date: Wed, 17 Jun 2026 10:21:59 -0400
Subject: [PATCH 3/3] [CIR] Add 32-bit ARM (GenericARM) codegen and lowering
 support

- Handle GenericARM in the transform-pass CXXABI dispatch, selecting the ARM
  method-pointer ABI and matching non-virtual member-pointer layout.
- Use the two-word array cookie for new[]/delete[] on 32-bit ARM.
- Return 'this' from constructors and non-deleting destructors, and add the
  'returned' attribute on the 'this' argument, mirroring classic CodeGen.
- Fix a verification crash when a static-storage object has a non-trivial
  destructor by building the destructor call against the structor's real
  ('this'-returning) signature.
- Drive the size_t width of __cxa_allocate_exception and the cir.copy memcpy
  length from the data layout instead of hardcoding i64.
- Implement the NEON vget_lane/vgetq_lane intrinsics as a vector element
  extraction; other ARM builtins still report NYI.

Adds CIR/CodeGen tests for the array cookie, this-return, global destructor,
exception allocation size, aggregate copy size, NEON lane reads and 32-bit
ARM record layout.
---
 clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp       | 37 ++++++++++++-
 clang/lib/CIR/CodeGen/CIRGenCXX.cpp           |  9 ++-
 clang/lib/CIR/CodeGen/CIRGenCXXABI.cpp        |  4 ++
 clang/lib/CIR/CodeGen/CIRGenCXXABI.h          |  4 ++
 clang/lib/CIR/CodeGen/CIRGenCall.cpp          | 10 +---
 clang/lib/CIR/CodeGen/CIRGenFunction.cpp      |  8 ++-
 clang/lib/CIR/CodeGen/CIRGenFunction.h        |  4 ++
 clang/lib/CIR/CodeGen/CIRGenItaniumCXXABI.cpp | 55 ++++++++++++++++---
 clang/lib/CIR/CodeGen/CIRGenModule.cpp        | 12 ++++
 .../CIR/Dialect/Transforms/CXXABILowering.cpp | 11 +++-
 .../TargetLowering/LowerItaniumCXXABI.cpp     | 28 ++++++++--
 .../CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp | 24 +++++---
 .../CIR/CodeGen/arm-aggregate-copy-size.cpp   | 16 ++++++
 clang/test/CIR/CodeGen/arm-array-cookie.cpp   | 32 +++++++++++
 clang/test/CIR/CodeGen/arm-global-dtor.cpp    | 29 ++++++++++
 clang/test/CIR/CodeGen/arm-neon-vget-lane.c   | 20 +++++++
 clang/test/CIR/CodeGen/arm-record-layout.cpp  | 39 +++++++++++++
 clang/test/CIR/CodeGen/arm-this-return.cpp    | 28 ++++++++++
 .../test/CIR/CodeGen/arm-throw-alloc-size.cpp | 16 ++++++
 19 files changed, 352 insertions(+), 34 deletions(-)
 create mode 100644 clang/test/CIR/CodeGen/arm-aggregate-copy-size.cpp
 create mode 100644 clang/test/CIR/CodeGen/arm-array-cookie.cpp
 create mode 100644 clang/test/CIR/CodeGen/arm-global-dtor.cpp
 create mode 100644 clang/test/CIR/CodeGen/arm-neon-vget-lane.c
 create mode 100644 clang/test/CIR/CodeGen/arm-record-layout.cpp
 create mode 100644 clang/test/CIR/CodeGen/arm-this-return.cpp
 create mode 100644 clang/test/CIR/CodeGen/arm-throw-alloc-size.cpp

diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp 
b/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
index a483eb635f0e2..fe5003d6d953c 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
@@ -24,6 +24,7 @@
 #include "clang/Basic/Builtins.h"
 #include "clang/Basic/DiagnosticFrontend.h"
 #include "clang/Basic/OperatorKinds.h"
+#include "clang/Basic/TargetBuiltins.h"
 #include "clang/CIR/Dialect/IR/CIRTypes.h"
 #include "clang/CIR/MissingFeatures.h"
 #include "llvm/IR/Intrinsics.h"
@@ -2647,6 +2648,38 @@ RValue CIRGenFunction::emitBuiltinExpr(const GlobalDecl 
&gd, unsigned builtinID,
   return getUndefRValue(e->getType());
 }
 
+std::optional<mlir::Value>
+CIRGenFunction::emitARMBuiltinExpr(unsigned builtinID, const CallExpr *expr,
+                                  ReturnValueSlot returnValue,
+                                  llvm::Triple::ArchType arch) {
+  // Only the NEON lane-read intrinsics are implemented for 32-bit ARM so far;
+  // they lower to a vector element extraction, matching classic CodeGen
+  // (clang/lib/CodeGen/TargetBuiltins/ARM.cpp) and the AArch64 path.
+  switch (builtinID) {
+  default:
+    return std::nullopt;
+  case NEON::BI__builtin_neon_vget_lane_i8:
+  case NEON::BI__builtin_neon_vget_lane_i16:
+  case NEON::BI__builtin_neon_vget_lane_i32:
+  case NEON::BI__builtin_neon_vget_lane_i64:
+  case NEON::BI__builtin_neon_vget_lane_bf16:
+  case NEON::BI__builtin_neon_vget_lane_f32:
+  case NEON::BI__builtin_neon_vgetq_lane_i8:
+  case NEON::BI__builtin_neon_vgetq_lane_i16:
+  case NEON::BI__builtin_neon_vgetq_lane_i32:
+  case NEON::BI__builtin_neon_vgetq_lane_i64:
+  case NEON::BI__builtin_neon_vgetq_lane_bf16:
+  case NEON::BI__builtin_neon_vgetq_lane_f32:
+  case NEON::BI__builtin_neon_vduph_lane_bf16:
+  case NEON::BI__builtin_neon_vduph_laneq_bf16: {
+    mlir::Location loc = getLoc(expr->getExprLoc());
+    mlir::Value vec = emitScalarExpr(expr->getArg(0));
+    mlir::Value index = emitScalarExpr(expr->getArg(1));
+    return cir::VecExtractOp::create(builder, loc, vec, index);
+  }
+  }
+}
+
 static std::optional<mlir::Value>
 emitTargetArchBuiltinExpr(CIRGenFunction *cgf, unsigned builtinID,
                           const CallExpr *e, ReturnValueSlot &returnValue,
@@ -2666,9 +2699,7 @@ emitTargetArchBuiltinExpr(CIRGenFunction *cgf, unsigned 
builtinID,
   case llvm::Triple::armeb:
   case llvm::Triple::thumb:
   case llvm::Triple::thumbeb:
-    // These are actually NYI, but that will be reported by emitBuiltinExpr.
-    // At this point, we don't even know that the builtin is target-specific.
-    return std::nullopt;
+    return cgf->emitARMBuiltinExpr(builtinID, e, returnValue, arch);
   case llvm::Triple::aarch64:
   case llvm::Triple::aarch64_32:
   case llvm::Triple::aarch64_be:
diff --git a/clang/lib/CIR/CodeGen/CIRGenCXX.cpp 
b/clang/lib/CIR/CodeGen/CIRGenCXX.cpp
index 9c008acbc6f68..54659c03da092 100644
--- a/clang/lib/CIR/CodeGen/CIRGenCXX.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenCXX.cpp
@@ -176,8 +176,13 @@ static void emitDeclDestroy(CIRGenFunction &cgf, const 
VarDecl *vd,
         mlir::cast<cir::PointerType>(thisAddr.getType()).getAddrSpace());
     if (realPtrTy != thisAddr.getType())
       thisAddr = builder.createBitcast(thisAddr.getLoc(), thisAddr, realPtrTy);
-    builder.createCallOp(cgf.getLoc(vd->getSourceRange()),
-                         mlir::FlatSymbolRefAttr::get(fnOp.getSymNameAttr()),
+    // Build the call against the destructor's actual signature rather than
+    // forcing a void result.  Under the ARM C++ ABI structors return `this`,
+    // so a void-typed call would have fewer results than the callee and fail
+    // verification. The FuncOp overload derives the result type from the
+    // destructor itself; the returned `this` is unused here and is discarded
+    // when LoweringPrepare registers the destructor with __cxa_atexit.
+    builder.createCallOp(cgf.getLoc(vd->getSourceRange()), fnOp,
                          mlir::ValueRange{thisAddr});
     assert(fnOp && "expected cir.func");
     // TODO(cir): This doesn't do anything but check for unhandled conditions.
diff --git a/clang/lib/CIR/CodeGen/CIRGenCXXABI.cpp 
b/clang/lib/CIR/CodeGen/CIRGenCXXABI.cpp
index 83062c3906edf..ad493668ab06d 100644
--- a/clang/lib/CIR/CodeGen/CIRGenCXXABI.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenCXXABI.cpp
@@ -80,6 +80,10 @@ void CIRGenCXXABI::setCXXABIThisValue(CIRGenFunction &cgf,
   cgf.cxxabiThisValue = thisPtr;
 }
 
+mlir::Value CIRGenCXXABI::getThisValue(CIRGenFunction &cgf) {
+  return cgf.cxxabiThisValue;
+}
+
 CharUnits CIRGenCXXABI::getArrayCookieSize(const CXXNewExpr *e) {
   if (!requiresArrayCookie(e))
     return CharUnits::Zero();
diff --git a/clang/lib/CIR/CodeGen/CIRGenCXXABI.h 
b/clang/lib/CIR/CodeGen/CIRGenCXXABI.h
index bb100f7afd929..c36fc4517f961 100644
--- a/clang/lib/CIR/CodeGen/CIRGenCXXABI.h
+++ b/clang/lib/CIR/CodeGen/CIRGenCXXABI.h
@@ -166,6 +166,10 @@ class CIRGenCXXABI {
   /// Loads the incoming C++ this pointer as it was passed by the caller.
   mlir::Value loadIncomingCXXThis(CIRGenFunction &cgf);
 
+  /// Returns the C++ this pointer as set by setCXXABIThisValue, before any
+  /// adjustment.
+  mlir::Value getThisValue(CIRGenFunction &cgf);
+
   virtual CatchTypeInfo
   getAddrOfCXXCatchHandlerType(mlir::Location loc, QualType ty,
                                QualType catchHandlerType) = 0;
diff --git a/clang/lib/CIR/CodeGen/CIRGenCall.cpp 
b/clang/lib/CIR/CodeGen/CIRGenCall.cpp
index f648eff375a77..83665b5ec3947 100644
--- a/clang/lib/CIR/CodeGen/CIRGenCall.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenCall.cpp
@@ -839,9 +839,6 @@ CIRGenTypes::arrangeCXXStructorDeclaration(GlobalDecl gd) {
                                ? astContext.VoidPtrTy
                                : astContext.VoidTy;
 
-  assert(!theCXXABI.hasThisReturn(gd) &&
-         "Please send PR with a test and remove this");
-
   assert(!cir::MissingFeatures::opCallCIRGenFuncInfoExtParamInfo());
   assert(!cir::MissingFeatures::opCallFnInfoOpts());
 
@@ -973,13 +970,12 @@ const CIRGenFunctionInfo 
&CIRGenTypes::arrangeCXXConstructorCall(
                               : RequiredArgs::All;
 
   GlobalDecl gd(d, ctorKind);
-  if (theCXXABI.hasThisReturn(gd))
-    cgm.errorNYI(d->getSourceRange(),
-                 "arrangeCXXConstructorCall: hasThisReturn");
   if (theCXXABI.hasMostDerivedReturn(gd))
     cgm.errorNYI(d->getSourceRange(),
                  "arrangeCXXConstructorCall: hasMostDerivedReturn");
-  CanQualType resultType = astContext.VoidTy;
+  // args[0] is the implicit 'this'; ABIs that return 'this' use its type.
+  CanQualType resultType =
+      theCXXABI.hasThisReturn(gd) ? argTypes.front() : astContext.VoidTy;
 
   assert(!cir::MissingFeatures::opCallFnInfoOpts());
   assert(!cir::MissingFeatures::opCallCIRGenFuncInfoExtParamInfo());
diff --git a/clang/lib/CIR/CodeGen/CIRGenFunction.cpp 
b/clang/lib/CIR/CodeGen/CIRGenFunction.cpp
index 4b020c96964a7..856bcb530974c 100644
--- a/clang/lib/CIR/CodeGen/CIRGenFunction.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenFunction.cpp
@@ -1031,11 +1031,13 @@ clang::QualType 
CIRGenFunction::buildFunctionArgList(clang::GlobalDecl gd,
   // object as a regular parameter that fd->parameters() already enumerates.
   const auto *md = dyn_cast<CXXMethodDecl>(fd);
   if (md && md->isImplicitObjectMemberFunction()) {
-    if (cgm.getCXXABI().hasThisReturn(gd))
-      cgm.errorNYI(fd->getSourceRange(), "this return");
-    else if (cgm.getCXXABI().hasMostDerivedReturn(gd))
+    if (cgm.getCXXABI().hasMostDerivedReturn(gd))
       cgm.errorNYI(fd->getSourceRange(), "most derived return");
     cgm.getCXXABI().buildThisParam(*this, args);
+    // ABIs that return 'this' make the function's return type the 'this'
+    // pointer, so a return slot is allocated and the prolog can store into it.
+    if (cgm.getCXXABI().hasThisReturn(gd))
+      retTy = args.front()->getType();
   }
 
   bool passedParams = true;
diff --git a/clang/lib/CIR/CodeGen/CIRGenFunction.h 
b/clang/lib/CIR/CodeGen/CIRGenFunction.h
index 317151c8d61c6..696b2d168324f 100644
--- a/clang/lib/CIR/CodeGen/CIRGenFunction.h
+++ b/clang/lib/CIR/CodeGen/CIRGenFunction.h
@@ -1537,6 +1537,10 @@ class CIRGenFunction : public CIRGenTypeCache {
   emitAArch64BuiltinExpr(unsigned builtinID, const CallExpr *expr,
                          ReturnValueSlot returnValue,
                          llvm::Triple::ArchType arch);
+  std::optional<mlir::Value>
+  emitARMBuiltinExpr(unsigned builtinID, const CallExpr *expr,
+                     ReturnValueSlot returnValue,
+                     llvm::Triple::ArchType arch);
   std::optional<mlir::Value> emitAArch64SMEBuiltinExpr(unsigned builtinID,
                                                        const CallExpr *expr);
   std::optional<mlir::Value> emitAArch64SVEBuiltinExpr(unsigned builtinID,
diff --git a/clang/lib/CIR/CodeGen/CIRGenItaniumCXXABI.cpp 
b/clang/lib/CIR/CodeGen/CIRGenItaniumCXXABI.cpp
index 552d73966e97b..3953e9badf604 100644
--- a/clang/lib/CIR/CodeGen/CIRGenItaniumCXXABI.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenItaniumCXXABI.cpp
@@ -37,12 +37,31 @@ class CIRGenItaniumCXXABI : public CIRGenCXXABI {
   /// All the vtables which have been defined.
   llvm::DenseMap<const CXXRecordDecl *, cir::GlobalOp> vtables;
 
+  /// 32-bit ARM uses a two-word array cookie ({element_size, element_count})
+  /// rather than the generic single-size_t cookie.
+  bool useARMArrayCookieABI;
+
+  /// 32-bit ARM returns 'this' from constructors and non-deleting destructors.
+  bool useARMThisReturnABI;
+
 public:
-  CIRGenItaniumCXXABI(CIRGenModule &cgm) : CIRGenCXXABI(cgm) {
+  CIRGenItaniumCXXABI(CIRGenModule &cgm, bool useARMArrayCookieABI = false,
+                      bool useARMThisReturnABI = false)
+      : CIRGenCXXABI(cgm), useARMArrayCookieABI(useARMArrayCookieABI),
+        useARMThisReturnABI(useARMThisReturnABI) {
     assert(!cir::MissingFeatures::cxxabiUseARMMethodPtrABI());
     assert(!cir::MissingFeatures::cxxabiUseARMGuardVarABI());
   }
 
+  bool hasThisReturn(clang::GlobalDecl gd) const override {
+    if (!useARMThisReturnABI)
+      return false;
+    // Constructors and non-deleting destructors return 'this'.
+    return isa<CXXConstructorDecl>(gd.getDecl()) ||
+           (isa<CXXDestructorDecl>(gd.getDecl()) &&
+            gd.getDtorType() != Dtor_Deleting);
+  }
+
   AddedStructorArgs getImplicitConstructorArgs(CIRGenFunction &cgf,
                                                const CXXConstructorDecl *d,
                                                CXXCtorType type,
@@ -262,8 +281,10 @@ void 
CIRGenItaniumCXXABI::emitInstanceFunctionProlog(SourceLocation loc,
   /// 2) in theory, an ABI could implement 'this' returns some other way;
   ///    HasThisReturn only specifies a contract, not the implementation
   if (hasThisReturn(cgf.curGD)) {
-    cgf.cgm.errorNYI(cgf.curFuncDecl->getLocation(),
-                     "emitInstanceFunctionProlog: hasThisReturn");
+    // Store 'this' into the return slot at function entry; the epilogue
+    // returns whatever is in that slot.
+    cgf.getBuilder().createStore(cgf.getLoc(loc), getThisValue(cgf),
+                                 cgf.returnValue);
   }
 }
 
@@ -1872,9 +1893,14 @@ CIRGenCXXABI 
*clang::CIRGen::CreateCIRGenItaniumCXXABI(CIRGenModule &cgm) {
   switch (cgm.getASTContext().getCXXABIKind()) {
   case TargetCXXABI::GenericItanium:
   case TargetCXXABI::GenericAArch64:
-  case TargetCXXABI::GenericARM:
     return new CIRGenItaniumCXXABI(cgm);
 
+  case TargetCXXABI::GenericARM:
+    // 32-bit ARM uses the two-word array cookie and returns 'this' from
+    // constructors and non-deleting destructors.
+    return new CIRGenItaniumCXXABI(cgm, /*useARMArrayCookieABI=*/true,
+                                   /*useARMThisReturnABI=*/true);
+
   case TargetCXXABI::AppleARM64:
     // The general Itanium ABI will do until we implement something that
     // requires special handling.
@@ -2420,6 +2446,12 @@ void CIRGenItaniumCXXABI::emitVirtualObjectDelete(
 /************************** Array allocation cookies 
**************************/
 
 CharUnits CIRGenItaniumCXXABI::getArrayCookieSizeImpl(QualType elementType) {
+  if (useARMArrayCookieABI) {
+    // On 32-bit ARM the cookie is always two size_t words:
+    //   struct array_cookie { size_t element_size; size_t element_count; };
+    return cgm.getSizeSize() * 2;
+  }
+
   // The array cookie is a size_t; pad that up to the element alignment.
   // The cookie is actually right-justified in that space.
   return std::max(
@@ -2444,9 +2476,7 @@ Address 
CIRGenItaniumCXXABI::initializeArrayCookie(CIRGenFunction &cgf,
   mlir::Location loc = cgf.getLoc(e->getSourceRange());
 
   // The size of the cookie.
-  CharUnits cookieSize =
-      std::max(sizeSize, ctx.getPreferredTypeAlignInChars(elementType));
-  assert(cookieSize == getArrayCookieSizeImpl(elementType));
+  CharUnits cookieSize = getArrayCookieSizeImpl(elementType);
 
   mlir::Type u8Ty = cgf.getBuilder().getUInt8Ty();
   cir::PointerType u8PtrTy = cgf.getBuilder().getUInt8PtrTy();
@@ -2472,6 +2502,17 @@ Address 
CIRGenItaniumCXXABI::initializeArrayCookie(CIRGenFunction &cgf,
       cookiePtr.withElementType(cgf.getBuilder(), cgf.sizeTy);
   cgf.getBuilder().createStore(loc, numElements, numElementsPtr);
 
+  // On 32-bit ARM the cookie's first word holds the element size.
+  if (useARMArrayCookieABI) {
+    CharUnits eltSize = ctx.getTypeSizeInChars(elementType);
+    Address eltSizePtr(baseBytePtr, u8Ty, baseAlignment);
+    Address eltSizeSlot =
+        eltSizePtr.withElementType(cgf.getBuilder(), cgf.sizeTy);
+    mlir::Value eltSizeVal =
+        cgf.getBuilder().getConstInt(loc, cgf.sizeTy, eltSize.getQuantity());
+    cgf.getBuilder().createStore(loc, eltSizeVal, eltSizeSlot);
+  }
+
   // Finally, compute a pointer to the actual data buffer by skipping
   // over the cookie completely.
   mlir::Value dataOffset =
diff --git a/clang/lib/CIR/CodeGen/CIRGenModule.cpp 
b/clang/lib/CIR/CodeGen/CIRGenModule.cpp
index b377f84e8d370..c01c1a46d29b0 100644
--- a/clang/lib/CIR/CodeGen/CIRGenModule.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenModule.cpp
@@ -16,6 +16,7 @@
 #include "CIRGenConstantEmitter.h"
 #include "CIRGenFunction.h"
 
+#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
 #include "mlir/Dialect/OpenMP/OpenMPOffloadUtils.h"
 #include "mlir/IR/SymbolTable.h"
 #include "clang/AST/ASTContext.h"
@@ -3016,6 +3017,17 @@ void CIRGenModule::setFunctionAttributes(GlobalDecl 
globalDecl,
                              getTypes().arrangeGlobalDeclaration(globalDecl),
                              func, isThunk);
 
+  // Add the `returned` attribute for "this" on constructors/destructors that
+  // return it (e.g. under the ARM C++ ABI), except for iOS 5 and earlier where
+  // substantial code, including the libstdc++ dylib, was compiled with GCC and
+  // does not actually return "this".
+  if (!isThunk && getCXXABI().hasThisReturn(globalDecl) &&
+      !(getTriple().isiOS() && getTriple().isOSVersionLT(6))) {
+    assert(func.getNumArguments() != 0 && "unexpected this return");
+    func.setArgAttr(0, mlir::LLVM::LLVMDialect::getReturnedAttrName(),
+                    mlir::UnitAttr::get(&getMLIRContext()));
+  }
+
   if (!isIncompleteFunction && func.isDeclaration())
     getTargetCIRGenInfo().setTargetAttributes(funcDecl, func, *this);
 
diff --git a/clang/lib/CIR/Dialect/Transforms/CXXABILowering.cpp 
b/clang/lib/CIR/Dialect/Transforms/CXXABILowering.cpp
index 0bcfe124723e6..d288da4619ea8 100644
--- a/clang/lib/CIR/Dialect/Transforms/CXXABILowering.cpp
+++ b/clang/lib/CIR/Dialect/Transforms/CXXABILowering.cpp
@@ -671,13 +671,22 @@ mlir::LogicalResult 
CIRDeleteArrayOpABILowering::matchAndRewrite(
       [&](mlir::OpBuilder &b, mlir::Location l) {
         if (dtorFn) {
           auto eltPtrTy = cir::PointerType::get(ptrTy.getPointee());
+          // The element destructor returns void on most targets, but returns
+          // 'this' under ABIs with ctor/dtor this-return (e.g. 32-bit ARM).
+          // Match the callee's result type so the call verifies.
+          mlir::Type dtorResultTy = cir::VoidType::get(rewriter.getContext());
+          if (auto dtorFunc =
+                  mlir::SymbolTable::lookupNearestSymbolFrom<cir::FuncOp>(
+                      op, dtorFn))
+            if (!dtorFunc.getFunctionType().hasVoidReturn())
+              dtorResultTy = dtorFunc.getFunctionType().getReturnType();
           auto arrayDtor = cir::ArrayDtor::create(
               b, l, loweredAddress, numElements,
               [&](mlir::OpBuilder &bb, mlir::Location ll) {
                 mlir::Value arg =
                     bb.getInsertionBlock()->addArgument(eltPtrTy, ll);
                 auto dtorCall = cir::CallOp::create(
-                    bb, ll, dtorFn, cir::VoidType(), mlir::ValueRange{arg});
+                    bb, ll, dtorFn, dtorResultTy, mlir::ValueRange{arg});
                 if (!op.getDtorMayThrow())
                   dtorCall.setNothrowAttr(bb.getUnitAttr());
                 cir::YieldOp::create(bb, ll);
diff --git 
a/clang/lib/CIR/Dialect/Transforms/TargetLowering/LowerItaniumCXXABI.cpp 
b/clang/lib/CIR/Dialect/Transforms/TargetLowering/LowerItaniumCXXABI.cpp
index 0e246c8612f25..1ed83e0c6148a 100644
--- a/clang/lib/CIR/Dialect/Transforms/TargetLowering/LowerItaniumCXXABI.cpp
+++ b/clang/lib/CIR/Dialect/Transforms/TargetLowering/LowerItaniumCXXABI.cpp
@@ -34,12 +34,15 @@ class LowerItaniumCXXABI : public CIRCXXABI {
 protected:
   bool useARMMethodPtrABI;
   bool use32BitVTableOffsetABI;
+  bool useARMArrayCookieABI;
 
 public:
   LowerItaniumCXXABI(LowerModule &lm, bool useARMMethodPtrABI = false,
-                     bool use32BitVTableOffsetABI = false)
+                     bool use32BitVTableOffsetABI = false,
+                     bool useARMArrayCookieABI = false)
       : CIRCXXABI(lm), useARMMethodPtrABI(useARMMethodPtrABI),
-        use32BitVTableOffsetABI(use32BitVTableOffsetABI) {}
+        use32BitVTableOffsetABI(use32BitVTableOffsetABI),
+        useARMArrayCookieABI(useARMArrayCookieABI) {}
 
   /// Lower the given data member pointer type to its ABI type. The returned
   /// type is also a CIR type.
@@ -144,6 +147,16 @@ std::unique_ptr<CIRCXXABI> createItaniumCXXABI(LowerModule 
&lm) {
         /*useARMMethodPtrABI=*/true,
         /*use32BitVTableOffsetABI=*/true);
 
+  case clang::TargetCXXABI::GenericARM:
+    // 32-bit ARM uses the ARM method-pointer encoding and the two-word array
+    // cookie but, unlike AppleARM64, does not use 32-bit vtable offsets. The
+    // non-trivial constructor/destructor return values are not yet modeled.
+    return std::make_unique<LowerItaniumCXXABI>(
+        lm,
+        /*useARMMethodPtrABI=*/true,
+        /*use32BitVTableOffsetABI=*/false,
+        /*useARMArrayCookieABI=*/true);
+
   case clang::TargetCXXABI::GenericItanium:
     return std::make_unique<LowerItaniumCXXABI>(lm);
 
@@ -861,10 +874,17 @@ 
LowerItaniumCXXABI::lowerVTableGetTypeInfo(cir::VTableGetTypeInfoOp op,
 
 clang::CharUnits LowerItaniumCXXABI::getArrayCookieSizeImpl(
     mlir::Type elementType, const mlir::DataLayout &dataLayout) const {
-  // The array cookie is a size_t; pad that up to the element alignment.
-  // The cookie is actually right-justified in that space.
   clang::CharUnits sizeOfSizeT =
       clang::CharUnits::fromQuantity(getPtrSizeInBits() / 8);
+
+  // On 32-bit ARM the cookie is always two size_t words: {element_size,
+  // element_count}. The element count stays right-justified, so the read
+  // logic below is unchanged.
+  if (useARMArrayCookieABI)
+    return sizeOfSizeT * 2;
+
+  // The array cookie is a size_t; pad that up to the element alignment.
+  // The cookie is actually right-justified in that space.
   clang::CharUnits eltAlign = clang::CharUnits::fromQuantity(
       dataLayout.getTypePreferredAlignment(elementType));
   return std::max(sizeOfSizeT, eltAlign);
diff --git a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp 
b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
index 1cc7f986de1c9..113f91fe8c404 100644
--- a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+++ b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
@@ -212,9 +212,13 @@ mlir::LogicalResult 
CIRToLLVMCopyOpLowering::matchAndRewrite(
     cir::CopyOp op, OpAdaptor adaptor,
     mlir::ConversionPatternRewriter &rewriter) const {
   mlir::DataLayout layout(op->getParentOfType<mlir::ModuleOp>());
+  // The llvm.memcpy length is size_t-wide; its width is target-dependent (e.g.
+  // 32 bits on 32-bit ARM), so drive it from the data layout rather than
+  // hardcoding i64.
+  auto llvmPtrTy = mlir::LLVM::LLVMPointerType::get(rewriter.getContext());
+  mlir::Type lenTy = 
rewriter.getIntegerType(layout.getTypeSizeInBits(llvmPtrTy));
   const mlir::Value length = mlir::LLVM::ConstantOp::create(
-      rewriter, op.getLoc(), rewriter.getI64Type(),
-      op.getCopySizeInBytes(layout));
+      rewriter, op.getLoc(), lenTy, op.getCopySizeInBytes(layout));
   assert(!cir::MissingFeatures::aggValueSlotVolatile());
   rewriter.replaceOpWithNewOp<mlir::LLVM::MemcpyOp>(
       op, adaptor.getDst(), adaptor.getSrc(), length, op.getIsVolatile());
@@ -3880,15 +3884,21 @@ mlir::LogicalResult 
CIRToLLVMThrowOpLowering::matchAndRewrite(
 mlir::LogicalResult CIRToLLVMAllocExceptionOpLowering::matchAndRewrite(
     cir::AllocExceptionOp op, OpAdaptor adaptor,
     mlir::ConversionPatternRewriter &rewriter) const {
-  // Get or create `declare ptr @__cxa_allocate_exception(i64)`
+  // Get or create `declare ptr @__cxa_allocate_exception(size_t)`. The
+  // thrown_size parameter is size_t, whose width is target-dependent (e.g. 32
+  // bits on 32-bit ARM), so drive it from the data layout rather than 
hardcoding
+  // i64; otherwise the call mismatches the runtime on 32-bit targets.
   StringRef fnName = "__cxa_allocate_exception";
   auto llvmPtrTy = mlir::LLVM::LLVMPointerType::get(rewriter.getContext());
-  auto int64Ty = mlir::IntegerType::get(rewriter.getContext(), 64);
-  auto fnTy = mlir::LLVM::LLVMFunctionType::get(llvmPtrTy, {int64Ty});
+  mlir::DataLayout layout(op->getParentOfType<mlir::ModuleOp>());
+  auto sizeTTy = mlir::IntegerType::get(rewriter.getContext(),
+                                        layout.getTypeSizeInBits(llvmPtrTy));
+  auto fnTy = mlir::LLVM::LLVMFunctionType::get(llvmPtrTy, {sizeTTy});
 
   createLLVMFuncOpIfNotExist(rewriter, op, fnName, fnTy);
-  auto exceptionSize = mlir::LLVM::ConstantOp::create(rewriter, op.getLoc(),
-                                                      adaptor.getSizeAttr());
+  auto exceptionSize = mlir::LLVM::ConstantOp::create(
+      rewriter, op.getLoc(), sizeTTy,
+      rewriter.getIntegerAttr(sizeTTy, op.getSize()));
 
   auto allocaExceptionCall = mlir::LLVM::CallOp::create(
       rewriter, op.getLoc(), mlir::TypeRange{llvmPtrTy}, fnName,
diff --git a/clang/test/CIR/CodeGen/arm-aggregate-copy-size.cpp 
b/clang/test/CIR/CodeGen/arm-aggregate-copy-size.cpp
new file mode 100644
index 0000000000000..0b070c66cddfa
--- /dev/null
+++ b/clang/test/CIR/CodeGen/arm-aggregate-copy-size.cpp
@@ -0,0 +1,16 @@
+// The length operand of the llvm.memcpy emitted for a cir.copy (e.g. passing 
an
+// aggregate by value) is size_t-wide, whose width is target-dependent.  On
+// 32-bit ARM it must be i32, not the hardcoded i64 used by 64-bit targets.
+//
+// RUN: %clang_cc1 -std=c++20 -triple arm-linux-gnueabihf -fclangir -emit-llvm 
%s -o %t.ll
+// RUN: FileCheck --check-prefix=ARM --input-file=%t.ll %s
+// RUN: %clang_cc1 -std=c++20 -triple x86_64-unknown-linux-gnu -fclangir 
-emit-llvm %s -o %t-x86.ll
+// RUN: FileCheck --check-prefix=X86 --input-file=%t-x86.ll %s
+
+struct P { int x; int y; };
+int sum(P p);
+int use() { P p; p.x = 1; p.y = 2; return sum(p); }
+
+// ARM: call void @llvm.memcpy.p0.p0.i32(ptr {{.*}}, ptr {{.*}}, i32 8, i1 
false)
+
+// X86: call void @llvm.memcpy.p0.p0.i64(ptr {{.*}}, ptr {{.*}}, i64 8, i1 
false)
diff --git a/clang/test/CIR/CodeGen/arm-array-cookie.cpp 
b/clang/test/CIR/CodeGen/arm-array-cookie.cpp
new file mode 100644
index 0000000000000..dc80a95264a20
--- /dev/null
+++ b/clang/test/CIR/CodeGen/arm-array-cookie.cpp
@@ -0,0 +1,32 @@
+// 32-bit ARM uses a two-word array cookie { element_size, element_count }, so 
a
+// new[] allocation is two size_t words larger than the element data: the data
+// starts 8 bytes in, and delete[] recovers the allocation base 8 bytes before
+// the data. (The generic Itanium cookie is a single size_t.)
+//
+// RUN: %clang_cc1 -std=c++20 -triple arm-linux-gnueabihf -fclangir -emit-llvm 
%s -o %t.ll
+// RUN: FileCheck --check-prefix=LLVM --input-file=%t.ll %s
+// RUN: %clang_cc1 -std=c++20 -triple arm-linux-gnueabihf -emit-llvm %s -o 
%t-ogcg.ll
+// RUN: FileCheck --check-prefix=OGCG --input-file=%t-ogcg.ll %s
+
+struct D {
+  ~D();
+  int v;
+};
+
+void make() {
+  D *p = new D[10];
+  delete[] p;
+}
+
+// 10 elements * 4 bytes + 8-byte cookie = 48.
+// LLVM: call {{.*}}@_Znaj(i32 noundef 48)
+// LLVM-DAG: store i32 4, ptr %{{[0-9]+}}
+// LLVM-DAG: store i32 10, ptr %{{[0-9]+}}
+// LLVM: getelementptr i8, ptr %{{[0-9]+}}, i32 8
+// LLVM: getelementptr i8, ptr %{{[0-9]+}}, i32 -8
+
+// OGCG: call {{.*}}@_Znaj(i32 noundef 48)
+// OGCG-DAG: store i32 4, ptr
+// OGCG-DAG: store i32 10, ptr
+// OGCG: getelementptr inbounds i8, ptr %{{.*}}, i32 8
+// OGCG: getelementptr inbounds i8, ptr %{{.*}}, i32 -8
diff --git a/clang/test/CIR/CodeGen/arm-global-dtor.cpp 
b/clang/test/CIR/CodeGen/arm-global-dtor.cpp
new file mode 100644
index 0000000000000..1e29beb350a66
--- /dev/null
+++ b/clang/test/CIR/CodeGen/arm-global-dtor.cpp
@@ -0,0 +1,29 @@
+// A static-storage object whose type has a non-trivial destructor must compile
+// on 32-bit ARM.  There, structors return 'this', so the implicit destructor
+// call emitted for the global must be typed against the destructor's real
+// (pointer-returning) signature; a void-typed call used to fail verification
+// with "'cir.call' op incorrect number of results for callee".
+//
+// RUN: %clang_cc1 -std=c++20 -triple arm-linux-gnueabihf -fclangir -emit-llvm 
%s -o %t.ll
+// RUN: FileCheck --check-prefix=ARM --input-file=%t.ll %s
+// RUN: %clang_cc1 -std=c++20 -triple x86_64-unknown-linux-gnu -fclangir 
-emit-llvm %s -o %t-x86.ll
+// RUN: FileCheck --check-prefix=X86 --input-file=%t-x86.ll %s
+
+struct S {
+  S();
+  ~S();
+  int x;
+};
+
+S g;
+
+// On ARM the constructor returns 'this' and the non-trivial destructor is
+// registered with __cxa_atexit for the global.
+// ARM-LABEL: define internal void @__cxx_global_var_init()
+// ARM:         call noundef ptr @_ZN1SC1Ev(ptr {{.*}} @g)
+// ARM:         call void @__cxa_atexit(ptr @_ZN1SD1Ev, ptr @g, ptr 
@__dso_handle)
+
+// On x86_64 the structors return void; registration is otherwise identical.
+// X86-LABEL: define internal void @__cxx_global_var_init()
+// X86:         call void @_ZN1SC1Ev(ptr {{.*}} @g)
+// X86:         call void @__cxa_atexit(ptr @_ZN1SD1Ev, ptr @g, ptr 
@__dso_handle)
diff --git a/clang/test/CIR/CodeGen/arm-neon-vget-lane.c 
b/clang/test/CIR/CodeGen/arm-neon-vget-lane.c
new file mode 100644
index 0000000000000..9f35e0c16fa98
--- /dev/null
+++ b/clang/test/CIR/CodeGen/arm-neon-vget-lane.c
@@ -0,0 +1,20 @@
+// The NEON lane-read intrinsics (vget_lane/vgetq_lane) lower to 
__builtin_neon_*
+// builtins on 32-bit ARM (unlike AArch64, where arm_neon.h uses generic vector
+// subscripting). Make sure CIR codegen lowers them to a vector element
+// extraction instead of reporting them as unimplemented.
+
+// RUN: %clang_cc1 -triple armv7-unknown-linux-gnueabihf -target-feature +neon 
-ffreestanding -fclangir -emit-llvm %s -o - | FileCheck %s
+
+#include <arm_neon.h>
+
+// CHECK-LABEL: define dso_local i32 @get_s32(
+// CHECK: extractelement <4 x i32> %{{.*}}, i32 2
+int get_s32(int32x4_t v) { return vgetq_lane_s32(v, 2); }
+
+// CHECK-LABEL: define dso_local float @get_f32(
+// CHECK: extractelement <4 x float> %{{.*}}, i32 1
+float get_f32(float32x4_t v) { return vgetq_lane_f32(v, 1); }
+
+// CHECK-LABEL: define dso_local i16 @get_s16(
+// CHECK: extractelement <4 x i16> %{{.*}}, i32 3
+short get_s16(int16x4_t v) { return vget_lane_s16(v, 3); }
diff --git a/clang/test/CIR/CodeGen/arm-record-layout.cpp 
b/clang/test/CIR/CodeGen/arm-record-layout.cpp
new file mode 100644
index 0000000000000..53a8d6024f0f5
--- /dev/null
+++ b/clang/test/CIR/CodeGen/arm-record-layout.cpp
@@ -0,0 +1,39 @@
+// 32-bit ARM lowers end-to-end through CIR, including the GenericARM CXXABI
+// path (virtual classes / vtables). A record containing a pointer lays out 
with
+// a 4-byte pointer followed by the next field with no padding, and both 
records
+// are 4-byte aligned.
+//
+//
+// RUN: %clang_cc1 -std=c++20 -triple arm-linux-gnueabihf -fclangir -emit-cir 
%s -o %t.cir
+// RUN: FileCheck --check-prefix=CIR --input-file=%t.cir %s
+// RUN: %clang_cc1 -std=c++20 -triple arm-linux-gnueabihf -fclangir -emit-llvm 
%s -o %t.ll
+// RUN: FileCheck --check-prefix=LLVM --input-file=%t.ll %s
+// RUN: %clang_cc1 -std=c++20 -triple arm-linux-gnueabihf -emit-llvm %s -o 
%t-ogcg.ll
+// RUN: FileCheck --check-prefix=OGCG --input-file=%t-ogcg.ll %s
+
+struct S {
+  int *p;
+  int x;
+};
+
+S s;
+
+class A {
+public:
+  virtual void f();
+  int x;
+};
+
+void A::f() {}
+
+// CIR-DAG: !rec_S = !cir.struct<"S" {!cir.ptr<!s32i>, !s32i}>
+// CIR-DAG: !rec_A = !cir.struct<class "A" {!cir.vptr, !s32i}>
+// CIR-DAG: !cir.ptr<!cir.void> = array<i32: 32, 32>
+// CIR: cir.global external @s = #cir.zero : !rec_S {alignment = 4 : i64}
+// CIR: cir.global {{.*}}@_ZTV1A = #cir.vtable<{{.*}}{alignment = 4 : i64}
+
+// LLVM: @s = global %struct.S zeroinitializer, align 4
+// LLVM: @_ZTV1A = global { [3 x ptr] } {{.*}}, align 4
+
+// OGCG: @s = global %struct.S zeroinitializer, align 4
+// OGCG: @_ZTV1A = {{.*}}constant { [3 x ptr] } {{.*}}, align 4
diff --git a/clang/test/CIR/CodeGen/arm-this-return.cpp 
b/clang/test/CIR/CodeGen/arm-this-return.cpp
new file mode 100644
index 0000000000000..9d111a86dda52
--- /dev/null
+++ b/clang/test/CIR/CodeGen/arm-this-return.cpp
@@ -0,0 +1,28 @@
+// 32-bit ARM returns 'this' from constructors and non-deleting destructors;
+// other targets return void.
+//
+// RUN: %clang_cc1 -std=c++20 -triple arm-linux-gnueabihf -fclangir -emit-llvm 
%s -o %t.ll
+// RUN: FileCheck --check-prefix=ARM --input-file=%t.ll %s
+// RUN: %clang_cc1 -std=c++20 -triple x86_64-unknown-linux-gnu -fclangir 
-emit-llvm %s -o %t-x86.ll
+// RUN: FileCheck --check-prefix=X86 --input-file=%t-x86.ll %s
+
+struct S {
+  S();
+  ~S();
+  int x;
+};
+
+S::S() { x = 1; }
+S::~S() {}
+
+// On ARM the constructor and destructor return the 'this' pointer, and the
+// 'this' argument carries the 'returned' attribute.
+// ARM: define{{.*}} ptr @_ZN1SC2Ev(ptr {{.*}} returned {{.*}})
+// ARM: ret ptr
+// ARM: define{{.*}} ptr @_ZN1SD2Ev(ptr {{.*}} returned {{.*}})
+// ARM: ret ptr
+
+// On x86_64 they return void and there is no 'returned' attribute.
+// X86: define{{.*}} void @_ZN1SC2Ev(ptr
+// X86-NOT: returned
+// X86: define{{.*}} void @_ZN1SD2Ev(ptr
diff --git a/clang/test/CIR/CodeGen/arm-throw-alloc-size.cpp 
b/clang/test/CIR/CodeGen/arm-throw-alloc-size.cpp
new file mode 100644
index 0000000000000..b74a52eceaa25
--- /dev/null
+++ b/clang/test/CIR/CodeGen/arm-throw-alloc-size.cpp
@@ -0,0 +1,16 @@
+// The thrown_size argument to __cxa_allocate_exception is size_t, whose width
+// is target-dependent.  On 32-bit ARM it must be i32, not the hardcoded i64
+// used by 64-bit targets.
+//
+// RUN: %clang_cc1 -std=c++20 -triple arm-linux-gnueabihf -fcxx-exceptions 
-fexceptions -fclangir -emit-llvm %s -o %t.ll
+// RUN: FileCheck --check-prefix=ARM --input-file=%t.ll %s
+// RUN: %clang_cc1 -std=c++20 -triple x86_64-unknown-linux-gnu 
-fcxx-exceptions -fexceptions -fclangir -emit-llvm %s -o %t-x86.ll
+// RUN: FileCheck --check-prefix=X86 --input-file=%t-x86.ll %s
+
+void f() { throw 42; }
+
+// ARM: declare ptr @__cxa_allocate_exception(i32)
+// ARM: call ptr @__cxa_allocate_exception(i32 4)
+
+// X86: declare ptr @__cxa_allocate_exception(i64)
+// X86: call ptr @__cxa_allocate_exception(i64 4)

_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to