https://github.com/banach-space created https://github.com/llvm/llvm-project/pull/175976
- **[mlir] Fix alignment for predicate (i1) vectors** - **[CIR][AArch64] Add lowering for predicated SVE svdup builtins (zeroing)** From fc98105ef968dd8e3c1b371f72cd073504b45c13 Mon Sep 17 00:00:00 2001 From: Andrzej Warzynski <[email protected]> Date: Wed, 14 Jan 2026 13:40:20 +0000 Subject: [PATCH 1/2] [mlir] Fix alignment for predicate (i1) vectors MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Legal scalable predicate vectors (legal in the LLVM sense), e.g. vector<[16]xi1> (or <vscale x 16 x i1>, using LLVM syntax) ought to have alignment 2 rather than 16, see e.g. [1]. MLIR currently computes the vector “size in bits” as: ```cpp vecType.getNumElements() * dataLayout.getTypeSize(vecType.getElementType()) * 8 ``` but `getTypeSize()` returns a size in *bytes* (rounded up from bits), so for `i1` it returns 1. Multiplying by 8 converts that storage byte back to 8 bits per element, which overestimates predicate vector sizes. Instead, use: ```cpp vecType.getNumElements() * dataLayout.getTypeSizeInBits(vecType.getElementType()) ``` For `vector<[16]xi1>` this changes: * [before]: `16 * (1 byte * 8) = 128 bits` * [after]: `16 * 1 bit = 16 bits` This is a very small update that, based on the available tests, only affects types like `vector<[16]xi1>`. It aligns MLIR with LLVM, making sure that the corresponding alignment is 2 rather that 16. For context, LLVM computes the alignment in this case via `getTypeStoreSize`, which for `16 x i1` returns 2 bytes. Perhaps MLIR should follow similar path in the future. [1] https://developer.arm.com/documentation/ddi0602/2025-12/SVE-Instructions/LDR--predicate---Load-predicate-register-?lang=en --- mlir/lib/Interfaces/DataLayoutInterfaces.cpp | 2 +- mlir/test/Interfaces/DataLayoutInterfaces/query.mlir | 6 ++++++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/mlir/lib/Interfaces/DataLayoutInterfaces.cpp b/mlir/lib/Interfaces/DataLayoutInterfaces.cpp index 782384999c70c..a6922ee5f4b5b 100644 --- a/mlir/lib/Interfaces/DataLayoutInterfaces.cpp +++ b/mlir/lib/Interfaces/DataLayoutInterfaces.cpp @@ -78,7 +78,7 @@ mlir::detail::getDefaultTypeSizeInBits(Type type, const DataLayout &dataLayout, if (auto vecType = dyn_cast<VectorType>(type)) { uint64_t baseSize = vecType.getNumElements() / vecType.getShape().back() * llvm::PowerOf2Ceil(vecType.getShape().back()) * - dataLayout.getTypeSize(vecType.getElementType()) * 8; + dataLayout.getTypeSizeInBits(vecType.getElementType()); return llvm::TypeSize::get(baseSize, vecType.isScalable()); } diff --git a/mlir/test/Interfaces/DataLayoutInterfaces/query.mlir b/mlir/test/Interfaces/DataLayoutInterfaces/query.mlir index 5df32555000ad..97ef8b2a8ae1c 100644 --- a/mlir/test/Interfaces/DataLayoutInterfaces/query.mlir +++ b/mlir/test/Interfaces/DataLayoutInterfaces/query.mlir @@ -44,6 +44,12 @@ func.func @no_layout_builtin() { // CHECK: preferred = 16 // CHECK: size = {minimal_size = 16 : index, scalable} "test.data_layout_query"() : () -> vector<[4]xi32> + // CHECK: alignment = 2 + // CHECK: bitsize = {minimal_size = 16 : index, scalable} + // CHECK: index = 0 + // CHECK: preferred = 2 + // CHECK: size = {minimal_size = 2 : index, scalable} + "test.data_layout_query"() : () -> vector<[16]xi1> return } From 8bd05e287993b5bcef0600656fe68c2a66f8ac07 Mon Sep 17 00:00:00 2001 From: Andrzej Warzynski <[email protected]> Date: Wed, 14 Jan 2026 14:56:55 +0000 Subject: [PATCH 2/2] [CIR][AArch64] Add lowering for predicated SVE svdup builtins (zeroing) This PR adds CIR lowering support for predicated SVE `svdup` builtins on AArch64. The corresponding ACLE intrinsics are documented at: https://developer.arm.com/architectures/instruction-sets/intrinsics This change focuses on the zeroing-predicated variants (suffix `_z`, e.g. `svdup_n_f32_z`), which lower to the LLVM SVE `dup` intrinsic with a `zeroinitializer` passthrough operand. IMPLEMENTATION NOTES -------------------- * The CIR type converter is extended to support `BuiltinType::SveBool`, which is lowered to `cir.vector<[16] x i1>`, matching current Clang behaviour and ensuring compatibility with existing LLVM SVE lowering. * Added logic that converts `cir.vector<[16] x i1>` according to the underlying element type. This is done by calling `@llvm.aarch64.sve.convert.from.svbool`. TEST NOTES ---------- Compared to the unpredicated `svdup` tests (#174433), the new tests perform more explicit checks to verify: * Correct argument usage * Correct return value + type This helped validate differences between the default Clang lowering and the CIR-based lowering. Once all `svdup` variants are implemented, the tests will be unified. EXAMPLE LOWERING ---------------- The following example illustrates that CIR lowering produces equivalent LLVM IR to the default Clang path. Input: ```c svint8_t test_svdup_n_s8(svbool_t pg, int8_t op) { return svdup_n_s8_z(pg, op); } OUTPUT 1 (default): ```llvm define dso_local <vscale x 16 x i8> @test(<vscale x 16 x i1> %pg, i8 noundef %op) #0 { entry: %pg.addr = alloca <vscale x 16 x i1>, align 2 %op.addr = alloca i8, align 1 store <vscale x 16 x i1> %pg, ptr %pg.addr, align 2 store i8 %op, ptr %op.addr, align 1 %0 = load <vscale x 16 x i1>, ptr %pg.addr, align 2 %1 = load i8, ptr %op.addr, align 1 %2 = call <vscale x 16 x i8> @llvm.aarch64.sve.dup.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i1> %0, i8 %1) ret <vscale x 16 x i8> %2 } ``` OUTPUT 2 (via `-fclangir`): ```llvm ; Function Attrs: noinline define dso_local <vscale x 16 x i8> @test(<vscale x 16 x i1> %0, i8 %1) #0 { %3 = alloca <vscale x 16 x i1>, i64 1, align 2 %4 = alloca i8, i64 1, align 1 %5 = alloca <vscale x 16 x i8>, i64 1, align 16 store <vscale x 16 x i1> %0, ptr %3, align 2 store i8 %1, ptr %4, align 1 %6 = load <vscale x 16 x i1>, ptr %3, align 2 %7 = load i8, ptr %4, align 1 %8 = call <vscale x 16 x i8> @llvm.aarch64.sve.dup.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i1> %6, i8 %7) store <vscale x 16 x i8> %8, ptr %5, align 16 %9 = load <vscale x 16 x i8>, ptr %5, align 16 ret <vscale x 16 x i8> %9 } ``` **DEPENDS ON:** https://github.com/llvm/llvm-project/pull/175961 --- .../lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp | 95 +++- clang/lib/CIR/CodeGen/CIRGenFunction.h | 2 + clang/lib/CIR/CodeGen/CIRGenTypes.cpp | 4 + .../CodeGenBuiltins/AArch64/acle_sve_dup.c | 477 +++++++++++++++++- 4 files changed, 564 insertions(+), 14 deletions(-) diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp index 93089eb585aa7..d59d3bebe0bb0 100644 --- a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp +++ b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp @@ -126,6 +126,81 @@ bool CIRGenFunction::getAArch64SVEProcessedOperands( return true; } +// Reinterpret the input predicate so that it can be used to correctly isolate +// the elements of the specified datatype. +mlir::Value CIRGenFunction::emitSVEpredicateCast(mlir::Value *pred, + unsigned minNumElts, + mlir::Location loc) { + + // TODO: Handle "aarch64.svcount" once we get round to supporting SME. + + auto retTy = cir::VectorType::get(builder.getUIntNTy(1), minNumElts, + /*is_scalable=*/true); + if (pred->getType() == retTy) + return *pred; + + unsigned intID; + mlir::Type intrinsicTy; + switch (minNumElts) { + default: + llvm_unreachable("unsupported element count!"); + case 1: + case 2: + case 4: + case 8: + intID = Intrinsic::aarch64_sve_convert_from_svbool; + intrinsicTy = retTy; + break; + case 16: + intID = Intrinsic::aarch64_sve_convert_to_svbool; + intrinsicTy = pred->getType(); + break; + } + + std::string llvmIntrName(Intrinsic::getBaseName(intID)); + llvmIntrName.erase(0, /*std::strlen(".llvm")=*/5); + auto call = emitIntrinsicCallOp(builder, loc, llvmIntrName, retTy, + mlir::ValueRange{*pred}); + assert(call.getType() == retTy && "Unexpected return type!"); + return call; +} + +// Return the element count for +static unsigned getSVEMinEltCount(const clang::SVETypeFlags::EltType &sveType) { + switch (sveType) { + default: + llvm_unreachable("Invalid SVETypeFlag!"); + + case SVETypeFlags::EltTyInt8: + return 16; + case SVETypeFlags::EltTyInt16: + return 8; + case SVETypeFlags::EltTyInt32: + return 4; + case SVETypeFlags::EltTyInt64: + return 2; + + case SVETypeFlags::EltTyMFloat8: + return 16; + case SVETypeFlags::EltTyFloat16: + case SVETypeFlags::EltTyBFloat16: + return 8; + case SVETypeFlags::EltTyFloat32: + return 4; + case SVETypeFlags::EltTyFloat64: + return 2; + + case SVETypeFlags::EltTyBool8: + return 16; + case SVETypeFlags::EltTyBool16: + return 8; + case SVETypeFlags::EltTyBool32: + return 4; + case SVETypeFlags::EltTyBool64: + return 2; + } +} + std::optional<mlir::Value> CIRGenFunction::emitAArch64SVEBuiltinExpr(unsigned builtinID, const CallExpr *expr) { @@ -171,10 +246,12 @@ CIRGenFunction::emitAArch64SVEBuiltinExpr(unsigned builtinID, std::string("unimplemented AArch64 builtin call: ") + getContext().BuiltinInfo.getName(builtinID)); - if (typeFlags.getMergeType() == SVETypeFlags::MergeZeroExp) - cgm.errorNYI(expr->getSourceRange(), - std::string("unimplemented AArch64 builtin call: ") + - getContext().BuiltinInfo.getName(builtinID)); + // Zero-ing predication + if (typeFlags.getMergeType() == SVETypeFlags::MergeZeroExp) { + auto null = builder.getNullValue(convertType(expr->getType()), + getLoc(expr->getExprLoc())); + ops.insert(ops.begin(), null); + } if (typeFlags.getMergeType() == SVETypeFlags::MergeAnyExp) cgm.errorNYI(expr->getSourceRange(), @@ -194,11 +271,11 @@ CIRGenFunction::emitAArch64SVEBuiltinExpr(unsigned builtinID, // Predicates must match the main datatype. for (mlir::Value &op : ops) - if (auto predTy = dyn_cast<mlir::VectorType>(op.getType())) - if (predTy.getElementType().isInteger(1)) - cgm.errorNYI(expr->getSourceRange(), - std::string("unimplemented AArch64 builtin call: ") + - getContext().BuiltinInfo.getName(builtinID)); + if (auto predTy = dyn_cast<cir::VectorType>(op.getType())) + if (auto cirInt = dyn_cast<cir::IntType>(predTy.getElementType())) + if (cirInt.getWidth() == 1) + op = emitSVEpredicateCast( + &op, getSVEMinEltCount(typeFlags.getEltType()), loc); // Splat scalar operand to vector (intrinsics with _n infix) if (typeFlags.hasSplatOperand()) { diff --git a/clang/lib/CIR/CodeGen/CIRGenFunction.h b/clang/lib/CIR/CodeGen/CIRGenFunction.h index 5fe1d9a4f2b76..86d2a8c4ac089 100644 --- a/clang/lib/CIR/CodeGen/CIRGenFunction.h +++ b/clang/lib/CIR/CodeGen/CIRGenFunction.h @@ -1269,6 +1269,8 @@ class CIRGenFunction : public CIRGenTypeCache { bool getAArch64SVEProcessedOperands(unsigned builtinID, const CallExpr *expr, SmallVectorImpl<mlir::Value> &ops, clang::SVETypeFlags typeFlags); + mlir::Value emitSVEpredicateCast(mlir::Value *pred, unsigned minNumElts, + mlir::Location loc); std::optional<mlir::Value> emitAArch64BuiltinExpr(unsigned builtinID, const CallExpr *expr, ReturnValueSlot returnValue, diff --git a/clang/lib/CIR/CodeGen/CIRGenTypes.cpp b/clang/lib/CIR/CodeGen/CIRGenTypes.cpp index 985c2901a7b04..f6220c616ed60 100644 --- a/clang/lib/CIR/CodeGen/CIRGenTypes.cpp +++ b/clang/lib/CIR/CodeGen/CIRGenTypes.cpp @@ -373,6 +373,10 @@ mlir::Type CIRGenTypes::convertType(QualType type) { resultType = cir::VectorType::get(builder.getDoubleTy(), 2, /*is_scalable=*/true); break; + case BuiltinType::SveBool: + resultType = cir::VectorType::get(builder.getUIntNTy(1), 16, + /*is_scalable=*/true); + break; // Unsigned integral types. case BuiltinType::Char8: diff --git a/clang/test/CIR/CodeGenBuiltins/AArch64/acle_sve_dup.c b/clang/test/CIR/CodeGenBuiltins/AArch64/acle_sve_dup.c index 3e0a892d6b368..60a2992ab14ad 100644 --- a/clang/test/CIR/CodeGenBuiltins/AArch64/acle_sve_dup.c +++ b/clang/test/CIR/CodeGenBuiltins/AArch64/acle_sve_dup.c @@ -1,13 +1,13 @@ // REQUIRES: aarch64-registered-target - +// // RUN: %clang_cc1 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -fclangir -emit-cir -o - %s | FileCheck %s --check-prefixes=ALL,CIR // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -fclangir -emit-cir -o - %s | FileCheck %s --check-prefixes=ALL,CIR -// RUN: %clang_cc1 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -fclangir -emit-llvm -o - %s | FileCheck %s --check-prefixes=ALL,LLVM_OGCG_CIR -// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -fclangir -emit-llvm -o - %s | FileCheck %s --check-prefixes=ALL,LLVM_OGCG_CIR +// RUN: %clang_cc1 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -fclangir -emit-llvm -o - %s | FileCheck %s --check-prefixes=ALL,LLVM_OGCG_CIR,LLVM_VIA_CIR +// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -fclangir -emit-llvm -o - %s | FileCheck %s --check-prefixes=ALL,LLVM_OGCG_CIR,LLVM_VIA_CIR -// RUN: %clang_cc1 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | FileCheck %s --check-prefixes=ALL,LLVM_OGCG_CIR -// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | FileCheck %s --check-prefixes=ALL,LLVM_OGCG_CIR +// RUN: %clang_cc1 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | FileCheck %s --check-prefixes=ALL,LLVM_OGCG_CIR,LLVM_DIRECT +// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | FileCheck %s --check-prefixes=ALL,LLVM_OGCG_CIR,LLVM_DIRECT #include <arm_sve.h> #if defined __ARM_FEATURE_SME @@ -209,3 +209,470 @@ svfloat64_t test_svdup_n_f64(float64_t op) MODE_ATTR // LLVM_OGCG_CIR: [[RES:%.*]] = call <vscale x 2 x double> @llvm.aarch64.sve.dup.x.nxv2f64(double [[OP_LOAD]]) return SVE_ACLE_FUNC(svdup,_n,_f64,)(op); } + +// ALL-LABEL: @test_svdup_n_s8_z +svint8_t test_svdup_n_s8_z(svbool_t pg, int8_t op) MODE_ATTR +{ +// CIR-SAME: %[[PG:.*]]: !cir.vector<[16] x !cir.int<u, 1>> +// CIR-SAME: %[[OP:.*]]: !s8i +// CIR-SAME: -> !cir.vector<[16] x !s8i> +// CIR: %[[ALLOCA_PG:.*]] = cir.alloca !cir.vector<[16] x !cir.int<u, 1>> +// CIR: %[[ALLOCA_OP:.*]] = cir.alloca !s8i +// CIR: %[[ALLOCA_RES:.*]] = cir.alloca !cir.vector<[16] x !s8i> +// CIR: cir.store %[[PG]], %[[ALLOCA_PG]] +// CIR: cir.store %[[OP]], %[[ALLOCA_OP]] +// CIR: %[[LOAD_PG:.*]] = cir.load align(2) %[[ALLOCA_PG]] +// CIR: %[[LOAD_OP:.*]] = cir.load align(1) %[[ALLOCA_OP]] +// CIR: %[[CONST_0:.*]] = cir.const #cir.zero : !cir.vector<[16] x !s8i> +// CIR: %[[CONVERT_PG:.*]] = cir.call_llvm_intrinsic "aarch64.sve.dup" %[[CONST_0]], %[[LOAD_PG]], %[[LOAD_OP]] +// CIR-SAME: -> !cir.vector<[16] x !s8i> +// CIR: cir.store %[[CONVERT_PG]], %[[ALLOCA_RES]] +// CIR: %[[RES:.*]] = cir.load %[[ALLOCA_RES]] +// CIR: cir.return %[[RES]] + +// LLVM_OGCG_CIR-SAME: <vscale x 16 x i1> [[PG:%.*]], i8 {{(noundef)?[[:space:]]?}}[[OP:%.*]]) +// LLVM_OGCG_CIR: [[PG_ADDR:%.*]] = alloca <vscale x 16 x i1>,{{([[:space:]]?i64 1,)?}} align 2 +// LLVM_OGCG_CIR: [[OP_ADDR:%.*]] = alloca i8,{{([[:space:]]?i64 1,)?}} align 1 +// +// LLVM_VIA_CIR: [[RES_ADDR:%.*]] = alloca <vscale x 16 x i8>,{{([[:space:]]?i64 1,)?}} align 16 +// +// LLVM_OGCG_CIR: store <vscale x 16 x i1> [[PG]], ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: store i8 [[OP]], ptr [[OP_ADDR]], align 1 +// LLVM_OGCG_CIR: [[TMP0:%.*]] = load <vscale x 16 x i1>, ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: [[TMP1:%.*]] = load i8, ptr [[OP_ADDR]], align 1 +// LLVM_OGCG_CIR: [[TMP2:%.*]] = call <vscale x 16 x i8> @llvm.aarch64.sve.dup.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i1> [[TMP0]], i8 [[TMP1]]) +// +// LLVM_DIRECT: ret {{.*}} [[TMP2]] +// +// LLVM_VIA_CIR: store {{.*}} [[TMP2]], ptr [[RES_ADDR]] +// LLVM_VIA_CIR: [[RES:%.*]] = load {{.*}} [[RES_ADDR]] +// LLVM_VIA_CIR: ret {{.*}} [[RES]] + return SVE_ACLE_FUNC(svdup,_n,_s8_z,)(pg, op); +} + +// ALL-LABEL: @test_svdup_n_s16_z( +svint16_t test_svdup_n_s16_z(svbool_t pg, int16_t op) MODE_ATTR +{ +// CIR-SAME: %[[PG:.*]]: !cir.vector<[16] x !cir.int<u, 1>> +// CIR-SAME: %[[OP:.*]]: !s16i +// CIR-SAME: -> !cir.vector<[8] x !s16i> +// CIR: %[[ALLOCA_PG:.*]] = cir.alloca !cir.vector<[16] x !cir.int<u, 1>> +// CIR: %[[ALLOCA_OP:.*]] = cir.alloca !s16i +// CIR: %[[ALLOCA_RES:.*]] = cir.alloca !cir.vector<[8] x !s16i> +// CIR: cir.store %[[PG]], %[[ALLOCA_PG]] +// CIR: cir.store %[[OP]], %[[ALLOCA_OP]] +// CIR: %[[LOAD_PG:.*]] = cir.load align(2) %[[ALLOCA_PG]] +// CIR: %[[LOAD_OP:.*]] = cir.load align(2) %[[ALLOCA_OP]] +// CIR: %[[CONST_0:.*]] = cir.const #cir.zero : !cir.vector<[8] x !s16i> +// CIR: %[[CONVERT_PG:.*]] = cir.call_llvm_intrinsic "aarch64.sve.convert.from.svbool" %[[LOAD_PG]] +// CIR-SAME: -> !cir.vector<[8] x !cir.int<u, 1>> +// CIR: %[[CALL_DUP:.*]] = cir.call_llvm_intrinsic "aarch64.sve.dup" %[[CONST_0]], %[[CONVERT_PG]], %[[LOAD_OP]] +// CIR-SAME: -> !cir.vector<[8] x !s16i> +// CIR: cir.store %[[CALL_DUP]], %[[ALLOCA_RES]] +// CIR: %[[RES:.*]] = cir.load %[[ALLOCA_RES]] +// CIR: cir.return %[[RES]] + +// LLVM_OGCG_CIR-SAME: <vscale x 16 x i1> [[PG:%.*]], i16 {{(noundef)?[[:space:]]?}}[[OP:%.*]]) +// LLVM_OGCG_CIR: [[PG_ADDR:%.*]] = alloca <vscale x 16 x i1>,{{([[:space:]]?i64 1,)?}} align 2 +// LLVM_OGCG_CIR: [[OP_ADDR:%.*]] = alloca i16,{{([[:space:]]?i64 1,)?}} align 2 +// +// LLVM_VIA_CIR: [[RES_ADDR:%.*]] = alloca <vscale x 8 x i16>,{{([[:space:]]?i64 1,)?}} align 16 +// +// LLVM_OGCG_CIR: store <vscale x 16 x i1> [[PG]], ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: store i16 [[OP]], ptr [[OP_ADDR]], align 2 +// LLVM_OGCG_CIR: [[TMP0:%.*]] = load <vscale x 16 x i1>, ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: [[TMP1:%.*]] = load i16, ptr [[OP_ADDR]], align 2 +// LLVM_OGCG_CIR: [[TMP2:%.*]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[TMP0]]) +// LLVM_OGCG_CIR: [[TMP3:%.*]] = call <vscale x 8 x i16> @llvm.aarch64.sve.dup.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i1> [[TMP2]], i16 [[TMP1]]) +// +// LLVM_DIRECT: ret {{.*}} [[TMP3]] +// +// LLVM_VIA_CIR: store {{.*}} [[TMP3]], ptr [[RES_ADDR]] +// LLVM_VIA_CIR: [[RES:%.*]] = load {{.*}} [[RES_ADDR]] +// LLVM_VIA_CIR: ret {{.*}} [[RES]] + return SVE_ACLE_FUNC(svdup,_n,_s16_z,)(pg, op); +} + +// ALL-LABEL: @test_svdup_n_s32_z( +svint32_t test_svdup_n_s32_z(svbool_t pg, int32_t op) MODE_ATTR +{ +// CIR-SAME: %[[PG:.*]]: !cir.vector<[16] x !cir.int<u, 1>> +// CIR-SAME: %[[OP:.*]]: !s32i +// CIR-SAME: -> !cir.vector<[4] x !s32i> +// CIR: %[[ALLOCA_PG:.*]] = cir.alloca !cir.vector<[16] x !cir.int<u, 1>> +// CIR: %[[ALLOCA_OP:.*]] = cir.alloca !s32i +// CIR: %[[ALLOCA_RES:.*]] = cir.alloca !cir.vector<[4] x !s32i> +// CIR: cir.store %[[PG]], %[[ALLOCA_PG]] +// CIR: cir.store %[[OP]], %[[ALLOCA_OP]] +// CIR: %[[LOAD_PG:.*]] = cir.load align(2) %[[ALLOCA_PG]] +// CIR: %[[LOAD_OP:.*]] = cir.load align(4) %[[ALLOCA_OP]] +// CIR: %[[CONST_0:.*]] = cir.const #cir.zero : !cir.vector<[4] x !s32i> +// CIR: %[[CONVERT_PG:.*]] = cir.call_llvm_intrinsic "aarch64.sve.convert.from.svbool" %[[LOAD_PG]] +// CIR-SAME: -> !cir.vector<[4] x !cir.int<u, 1>> +// CIR: %[[CALL_DUP:.*]] = cir.call_llvm_intrinsic "aarch64.sve.dup" %[[CONST_0]], %[[CONVERT_PG]], %[[LOAD_OP]] +// CIR-SAME: -> !cir.vector<[4] x !s32i> +// CIR: cir.store %[[CALL_DUP]], %[[ALLOCA_RES]] +// CIR: %[[RES:.*]] = cir.load %[[ALLOCA_RES]] +// CIR: cir.return %[[RES]] + +// LLVM_OGCG_CIR-SAME: <vscale x 16 x i1> [[PG:%.*]], i32 {{(noundef)?[[:space:]]?}}[[OP:%.*]]) +// LLVM_OGCG_CIR: [[PG_ADDR:%.*]] = alloca <vscale x 16 x i1>,{{([[:space:]]?i64 1,)?}} align 2 +// LLVM_OGCG_CIR: [[OP_ADDR:%.*]] = alloca i32,{{([[:space:]]?i64 1,)?}} align 4 +// +// LLVM_VIA_CIR: [[RES_ADDR:%.*]] = alloca <vscale x 4 x i32>,{{([[:space:]]?i64 1,)?}} align 16 +// +// LLVM_OGCG_CIR: store <vscale x 16 x i1> [[PG]], ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: store i32 [[OP]], ptr [[OP_ADDR]], align 4 +// LLVM_OGCG_CIR: [[TMP0:%.*]] = load <vscale x 16 x i1>, ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: [[TMP1:%.*]] = load i32, ptr [[OP_ADDR]], align 4 +// LLVM_OGCG_CIR: [[TMP2:%.*]] = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[TMP0]]) +// LLVM_OGCG_CIR: [[TMP3:%.*]] = call <vscale x 4 x i32> @llvm.aarch64.sve.dup.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i1> [[TMP2]], i32 [[TMP1]]) +// +// LLVM_DIRECT: ret {{.*}} [[TMP3]] +// +// LLVM_VIA_CIR: store {{.*}} [[TMP3]], ptr [[RES_ADDR]] +// LLVM_VIA_CIR: [[RES:%.*]] = load {{.*}} [[RES_ADDR]] +// LLVM_VIA_CIR: ret {{.*}} [[RES]] + return SVE_ACLE_FUNC(svdup,_n,_s32_z,)(pg, op); +} + +// ALL-LABEL: @test_svdup_n_s64_z( +svint64_t test_svdup_n_s64_z(svbool_t pg, int64_t op) MODE_ATTR +{ +// CIR-SAME: %[[PG:.*]]: !cir.vector<[16] x !cir.int<u, 1>> +// CIR-SAME: %[[OP:.*]]: !s64i +// CIR-SAME: -> !cir.vector<[2] x !s64i> +// CIR: %[[ALLOCA_PG:.*]] = cir.alloca !cir.vector<[16] x !cir.int<u, 1>> +// CIR: %[[ALLOCA_OP:.*]] = cir.alloca !s64i +// CIR: %[[ALLOCA_RES:.*]] = cir.alloca !cir.vector<[2] x !s64i> +// CIR: cir.store %[[PG]], %[[ALLOCA_PG]] +// CIR: cir.store %[[OP]], %[[ALLOCA_OP]] +// CIR: %[[LOAD_PG:.*]] = cir.load align(2) %[[ALLOCA_PG]] +// CIR: %[[LOAD_OP:.*]] = cir.load align(8) %[[ALLOCA_OP]] +// CIR: %[[CONST_0:.*]] = cir.const #cir.zero : !cir.vector<[2] x !s64i> +// CIR: %[[CONVERT_PG:.*]] = cir.call_llvm_intrinsic "aarch64.sve.convert.from.svbool" %[[LOAD_PG]] +// CIR-SAME: -> !cir.vector<[2] x !cir.int<u, 1>> +// CIR: %[[CALL_DUP:.*]] = cir.call_llvm_intrinsic "aarch64.sve.dup" %[[CONST_0]], %[[CONVERT_PG]], %[[LOAD_OP]] +// CIR-SAME: -> !cir.vector<[2] x !s64i> +// CIR: cir.store %[[CALL_DUP]], %[[ALLOCA_RES]] +// CIR: %[[RES:.*]] = cir.load %[[ALLOCA_RES]] +// CIR: cir.return %[[RES]] + +// LLVM_OGCG_CIR-SAME: <vscale x 16 x i1> [[PG:%.*]], i64 {{(noundef)?[[:space:]]?}}[[OP:%.*]]) +// LLVM_OGCG_CIR: [[PG_ADDR:%.*]] = alloca <vscale x 16 x i1>,{{([[:space:]]?i64 1,)?}} align 2 +// LLVM_OGCG_CIR: [[OP_ADDR:%.*]] = alloca i64,{{([[:space:]]?i64 1,)?}} align 8 +// +// LLVM_VIA_CIR: [[RES_ADDR:%.*]] = alloca <vscale x 2 x i64>,{{([[:space:]]?i64 1,)?}} align 16 +// +// LLVM_OGCG_CIR: store <vscale x 16 x i1> [[PG]], ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: store i64 [[OP]], ptr [[OP_ADDR]], align 8 +// LLVM_OGCG_CIR: [[TMP0:%.*]] = load <vscale x 16 x i1>, ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: [[TMP1:%.*]] = load i64, ptr [[OP_ADDR]], align 8 +// LLVM_OGCG_CIR: [[TMP2:%.*]] = call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[TMP0]]) +// LLVM_OGCG_CIR: [[TMP3:%.*]] = call <vscale x 2 x i64> @llvm.aarch64.sve.dup.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> [[TMP2]], i64 [[TMP1]]) +// +// LLVM_DIRECT: ret {{.*}} [[TMP3]] +// +// LLVM_VIA_CIR: store {{.*}} [[TMP3]], ptr [[RES_ADDR]] +// LLVM_VIA_CIR: [[RES:%.*]] = load {{.*}} [[RES_ADDR]] +// LLVM_VIA_CIR: ret {{.*}} [[RES]] + return SVE_ACLE_FUNC(svdup,_n,_s64_z,)(pg, op); +} + +// ALL-LABEL: @test_svdup_n_u8_z( +svuint8_t test_svdup_n_u8_z(svbool_t pg, uint8_t op) MODE_ATTR +{ +// CIR-SAME: %[[PG:.*]]: !cir.vector<[16] x !cir.int<u, 1>> +// CIR-SAME: %[[OP:.*]]: !u8i +// CIR-SAME: -> !cir.vector<[16] x !u8i> +// CIR: %[[ALLOCA_PG:.*]] = cir.alloca !cir.vector<[16] x !cir.int<u, 1>> +// CIR: %[[ALLOCA_OP:.*]] = cir.alloca !u8i +// CIR: %[[ALLOCA_RES:.*]] = cir.alloca !cir.vector<[16] x !u8i> +// CIR: cir.store %[[PG]], %[[ALLOCA_PG]] +// CIR: cir.store %[[OP]], %[[ALLOCA_OP]] +// CIR: %[[LOAD_PG:.*]] = cir.load align(2) %[[ALLOCA_PG]] +// CIR: %[[LOAD_OP:.*]] = cir.load align(1) %[[ALLOCA_OP]] +// CIR: %[[CONST_0:.*]] = cir.const #cir.zero : !cir.vector<[16] x !u8i> +// CIR: %[[CONVERT_PG:.*]] = cir.call_llvm_intrinsic "aarch64.sve.dup" %[[CONST_0]], %[[LOAD_PG]], %[[LOAD_OP]] +// CIR-SAME: -> !cir.vector<[16] x !u8i> +// CIR: cir.store %[[CONVERT_PG]], %[[ALLOCA_RES]] +// CIR: %[[RES:.*]] = cir.load %[[ALLOCA_RES]] +// CIR: cir.return %[[RES]] + +// LLVM_OGCG_CIR-SAME: <vscale x 16 x i1> [[PG:%.*]], i8 {{(noundef)?[[:space:]]?}}[[OP:%.*]]) +// LLVM_OGCG_CIR: [[PG_ADDR:%.*]] = alloca <vscale x 16 x i1>,{{([[:space:]]?i64 1,)?}} align 2 +// LLVM_OGCG_CIR: [[OP_ADDR:%.*]] = alloca i8,{{([[:space:]]?i64 1,)?}} align 1 +// +// LLVM_VIA_CIR: [[RES_ADDR:%.*]] = alloca <vscale x 16 x i8>,{{([[:space:]]?i64 1,)?}} align 16 +// +// LLVM_OGCG_CIR: store <vscale x 16 x i1> [[PG]], ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: store i8 [[OP]], ptr [[OP_ADDR]], align 1 +// LLVM_OGCG_CIR: [[TMP0:%.*]] = load <vscale x 16 x i1>, ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: [[TMP1:%.*]] = load i8, ptr [[OP_ADDR]], align 1 +// LLVM_OGCG_CIR: [[TMP2:%.*]] = call <vscale x 16 x i8> @llvm.aarch64.sve.dup.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i1> [[TMP0]], i8 [[TMP1]]) +// +// LLVM_DIRECT: ret {{.*}} [[TMP2]] +// +// LLVM_VIA_CIR: store {{.*}} [[TMP2]], ptr [[RES_ADDR]] +// LLVM_VIA_CIR: [[RES:%.*]] = load {{.*}} [[RES_ADDR]] +// LLVM_VIA_CIR: ret {{.*}} [[RES]] + return SVE_ACLE_FUNC(svdup,_n,_u8_z,)(pg, op); +} + +// ALL-LABEL: @test_svdup_n_u16_z( +svuint16_t test_svdup_n_u16_z(svbool_t pg, uint16_t op) MODE_ATTR +{ +// CIR-SAME: %[[PG:.*]]: !cir.vector<[16] x !cir.int<u, 1>> +// CIR-SAME: %[[OP:.*]]: !u16i +// CIR-SAME: -> !cir.vector<[8] x !u16i> +// CIR: %[[ALLOCA_PG:.*]] = cir.alloca !cir.vector<[16] x !cir.int<u, 1>> +// CIR: %[[ALLOCA_OP:.*]] = cir.alloca !u16i +// CIR: %[[ALLOCA_RES:.*]] = cir.alloca !cir.vector<[8] x !u16i> +// CIR: cir.store %[[PG]], %[[ALLOCA_PG]] +// CIR: cir.store %[[OP]], %[[ALLOCA_OP]] +// CIR: %[[LOAD_PG:.*]] = cir.load align(2) %[[ALLOCA_PG]] +// CIR: %[[LOAD_OP:.*]] = cir.load align(2) %[[ALLOCA_OP]] +// CIR: %[[CONST_0:.*]] = cir.const #cir.zero : !cir.vector<[8] x !u16i> +// CIR: %[[CONVERT_PG:.*]] = cir.call_llvm_intrinsic "aarch64.sve.convert.from.svbool" %[[LOAD_PG]] +// CIR-SAME: -> !cir.vector<[8] x !cir.int<u, 1>> +// CIR: %[[CALL_DUP:.*]] = cir.call_llvm_intrinsic "aarch64.sve.dup" %[[CONST_0]], %[[CONVERT_PG]], %[[LOAD_OP]] +// CIR-SAME: -> !cir.vector<[8] x !u16i> +// CIR: cir.store %[[CALL_DUP]], %[[ALLOCA_RES]] +// CIR: %[[RES:.*]] = cir.load %[[ALLOCA_RES]] +// CIR: cir.return %[[RES]] + +// LLVM_OGCG_CIR-SAME: <vscale x 16 x i1> [[PG:%.*]], i16 {{(noundef)?[[:space:]]?}}[[OP:%.*]]) +// LLVM_OGCG_CIR: [[PG_ADDR:%.*]] = alloca <vscale x 16 x i1>,{{([[:space:]]?i64 1,)?}} align 2 +// LLVM_OGCG_CIR: [[OP_ADDR:%.*]] = alloca i16,{{([[:space:]]?i64 1,)?}} align 2 +// +// LLVM_VIA_CIR: [[RES_ADDR:%.*]] = alloca <vscale x 8 x i16>,{{([[:space:]]?i64 1,)?}} align 16 +// +// LLVM_OGCG_CIR: store <vscale x 16 x i1> [[PG]], ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: store i16 [[OP]], ptr [[OP_ADDR]], align 2 +// LLVM_OGCG_CIR: [[TMP0:%.*]] = load <vscale x 16 x i1>, ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: [[TMP1:%.*]] = load i16, ptr [[OP_ADDR]], align 2 +// LLVM_OGCG_CIR: [[TMP2:%.*]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[TMP0]]) +// LLVM_OGCG_CIR: [[TMP3:%.*]] = call <vscale x 8 x i16> @llvm.aarch64.sve.dup.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i1> [[TMP2]], i16 [[TMP1]]) +// +// LLVM_DIRECT: ret {{.*}} [[TMP3]] +// +// LLVM_VIA_CIR: store {{.*}} [[TMP3]], ptr [[RES_ADDR]] +// LLVM_VIA_CIR: [[RES:%.*]] = load {{.*}} [[RES_ADDR]] +// LLVM_VIA_CIR: ret {{.*}} [[RES]] + return SVE_ACLE_FUNC(svdup,_n,_u16_z,)(pg, op); +} + +// ALL-LABEL: @test_svdup_n_u32_z( +svuint32_t test_svdup_n_u32_z(svbool_t pg, uint32_t op) MODE_ATTR +{ +// CIR-SAME: %[[PG:.*]]: !cir.vector<[16] x !cir.int<u, 1>> +// CIR-SAME: %[[OP:.*]]: !u32i +// CIR-SAME: -> !cir.vector<[4] x !u32i> +// CIR: %[[ALLOCA_PG:.*]] = cir.alloca !cir.vector<[16] x !cir.int<u, 1>> +// CIR: %[[ALLOCA_OP:.*]] = cir.alloca !u32i +// CIR: %[[ALLOCA_RES:.*]] = cir.alloca !cir.vector<[4] x !u32i> +// CIR: cir.store %[[PG]], %[[ALLOCA_PG]] +// CIR: cir.store %[[OP]], %[[ALLOCA_OP]] +// CIR: %[[LOAD_PG:.*]] = cir.load align(2) %[[ALLOCA_PG]] +// CIR: %[[LOAD_OP:.*]] = cir.load align(4) %[[ALLOCA_OP]] +// CIR: %[[CONST_0:.*]] = cir.const #cir.zero : !cir.vector<[4] x !u32i> +// CIR: %[[CONVERT_PG:.*]] = cir.call_llvm_intrinsic "aarch64.sve.convert.from.svbool" %[[LOAD_PG]] +// CIR-SAME: -> !cir.vector<[4] x !cir.int<u, 1>> +// CIR: %[[CALL_DUP:.*]] = cir.call_llvm_intrinsic "aarch64.sve.dup" %[[CONST_0]], %[[CONVERT_PG]], %[[LOAD_OP]] +// CIR-SAME: -> !cir.vector<[4] x !u32i> +// CIR: cir.store %[[CALL_DUP]], %[[ALLOCA_RES]] +// CIR: %[[RES:.*]] = cir.load %[[ALLOCA_RES]] +// CIR: cir.return %[[RES]] + +// LLVM_OGCG_CIR-SAME: <vscale x 16 x i1> [[PG:%.*]], i32 {{(noundef)?[[:space:]]?}}[[OP:%.*]]) +// LLVM_OGCG_CIR: [[PG_ADDR:%.*]] = alloca <vscale x 16 x i1>,{{([[:space:]]?i64 1,)?}} align 2 +// LLVM_OGCG_CIR: [[OP_ADDR:%.*]] = alloca i32,{{([[:space:]]?i64 1,)?}} align 4 +// +// LLVM_VIA_CIR: [[RES_ADDR:%.*]] = alloca <vscale x 4 x i32>,{{([[:space:]]?i64 1,)?}} align 16 +// +// LLVM_OGCG_CIR: store <vscale x 16 x i1> [[PG]], ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: store i32 [[OP]], ptr [[OP_ADDR]], align 4 +// LLVM_OGCG_CIR: [[TMP0:%.*]] = load <vscale x 16 x i1>, ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: [[TMP1:%.*]] = load i32, ptr [[OP_ADDR]], align 4 +// LLVM_OGCG_CIR: [[TMP2:%.*]] = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[TMP0]]) +// LLVM_OGCG_CIR: [[TMP3:%.*]] = call <vscale x 4 x i32> @llvm.aarch64.sve.dup.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i1> [[TMP2]], i32 [[TMP1]]) +// +// LLVM_DIRECT: ret {{.*}} [[TMP3]] +// +// LLVM_VIA_CIR: store {{.*}} [[TMP3]], ptr [[RES_ADDR]] +// LLVM_VIA_CIR: [[RES:%.*]] = load {{.*}} [[RES_ADDR]] +// LLVM_VIA_CIR: ret {{.*}} [[RES]] + return SVE_ACLE_FUNC(svdup,_n,_u32_z,)(pg, op); +} + +// ALL-LABEL: @test_svdup_n_u64_z( +svuint64_t test_svdup_n_u64_z(svbool_t pg, uint64_t op) MODE_ATTR +{ +// CIR-SAME: %[[PG:.*]]: !cir.vector<[16] x !cir.int<u, 1>> +// CIR-SAME: %[[OP:.*]]: !u64i +// CIR-SAME: -> !cir.vector<[2] x !u64i> +// CIR: %[[ALLOCA_PG:.*]] = cir.alloca !cir.vector<[16] x !cir.int<u, 1>> +// CIR: %[[ALLOCA_OP:.*]] = cir.alloca !u64i +// CIR: %[[ALLOCA_RES:.*]] = cir.alloca !cir.vector<[2] x !u64i> +// CIR: cir.store %[[PG]], %[[ALLOCA_PG]] +// CIR: cir.store %[[OP]], %[[ALLOCA_OP]] +// CIR: %[[LOAD_PG:.*]] = cir.load align(2) %[[ALLOCA_PG]] +// CIR: %[[LOAD_OP:.*]] = cir.load align(8) %[[ALLOCA_OP]] +// CIR: %[[CONST_0:.*]] = cir.const #cir.zero : !cir.vector<[2] x !u64i> +// CIR: %[[CONVERT_PG:.*]] = cir.call_llvm_intrinsic "aarch64.sve.convert.from.svbool" %[[LOAD_PG]] +// CIR-SAME: -> !cir.vector<[2] x !cir.int<u, 1>> +// CIR: %[[CALL_DUP:.*]] = cir.call_llvm_intrinsic "aarch64.sve.dup" %[[CONST_0]], %[[CONVERT_PG]], %[[LOAD_OP]] +// CIR-SAME: -> !cir.vector<[2] x !u64i> +// CIR: cir.store %[[CALL_DUP]], %[[ALLOCA_RES]] +// CIR: %[[RES:.*]] = cir.load %[[ALLOCA_RES]] +// CIR: cir.return %[[RES]] + +// LLVM_OGCG_CIR-SAME: <vscale x 16 x i1> [[PG:%.*]], i64 {{(noundef)?[[:space:]]?}}[[OP:%.*]]) +// LLVM_OGCG_CIR: [[PG_ADDR:%.*]] = alloca <vscale x 16 x i1>,{{([[:space:]]?i64 1,)?}} align 2 +// LLVM_OGCG_CIR: [[OP_ADDR:%.*]] = alloca i64,{{([[:space:]]?i64 1,)?}} align 8 +// +// LLVM_VIA_CIR: [[RES_ADDR:%.*]] = alloca <vscale x 2 x i64>,{{([[:space:]]?i64 1,)?}} align 16 +// +// LLVM_OGCG_CIR: store <vscale x 16 x i1> [[PG]], ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: store i64 [[OP]], ptr [[OP_ADDR]], align 8 +// LLVM_OGCG_CIR: [[TMP0:%.*]] = load <vscale x 16 x i1>, ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: [[TMP1:%.*]] = load i64, ptr [[OP_ADDR]], align 8 +// LLVM_OGCG_CIR: [[TMP2:%.*]] = call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[TMP0]]) +// LLVM_OGCG_CIR: [[TMP3:%.*]] = call <vscale x 2 x i64> @llvm.aarch64.sve.dup.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> [[TMP2]], i64 [[TMP1]]) +// +// LLVM_DIRECT: ret {{.*}} [[TMP3]] +// +// LLVM_VIA_CIR: store {{.*}} [[TMP3]], ptr [[RES_ADDR]] +// LLVM_VIA_CIR: [[RES:%.*]] = load {{.*}} [[RES_ADDR]] +// LLVM_VIA_CIR: ret {{.*}} [[RES]] + return SVE_ACLE_FUNC(svdup,_n,_u64_z,)(pg, op); +} + +// ALL-LABEL: @test_svdup_n_f16_z( +svfloat16_t test_svdup_n_f16_z(svbool_t pg, float16_t op) MODE_ATTR +{ +// CIR-SAME: %[[PG:.*]]: !cir.vector<[16] x !cir.int<u, 1>> +// CIR-SAME: %[[OP:.*]]: !cir.f16 +// CIR-SAME: -> !cir.vector<[8] x !cir.f16> +// CIR: %[[ALLOCA_PG:.*]] = cir.alloca !cir.vector<[16] x !cir.int<u, 1>> +// CIR: %[[ALLOCA_OP:.*]] = cir.alloca !cir.f16 +// CIR: %[[ALLOCA_RES:.*]] = cir.alloca !cir.vector<[8] x !cir.f16> +// CIR: cir.store %[[PG]], %[[ALLOCA_PG]] +// CIR: cir.store %[[OP]], %[[ALLOCA_OP]] +// CIR: %[[LOAD_PG:.*]] = cir.load align(2) %[[ALLOCA_PG]] +// CIR: %[[LOAD_OP:.*]] = cir.load align(2) %[[ALLOCA_OP]] +// CIR: %[[CONST_0:.*]] = cir.const #cir.zero : !cir.vector<[8] x !cir.f16> +// CIR: %[[CONVERT_PG:.*]] = cir.call_llvm_intrinsic "aarch64.sve.convert.from.svbool" %[[LOAD_PG]] +// CIR-SAME: -> !cir.vector<[8] x !cir.int<u, 1>> +// CIR: %[[CALL_DUP:.*]] = cir.call_llvm_intrinsic "aarch64.sve.dup" %[[CONST_0]], %[[CONVERT_PG]], %[[LOAD_OP]] +// CIR-SAME: -> !cir.vector<[8] x !cir.f16> +// CIR: cir.store %[[CALL_DUP]], %[[ALLOCA_RES]] +// CIR: %[[RES:.*]] = cir.load %[[ALLOCA_RES]] +// CIR: cir.return %[[RES]] + +// LLVM_OGCG_CIR-SAME: <vscale x 16 x i1> [[PG:%.*]], half {{(noundef)?[[:space:]]?}}[[OP:%.*]]) +// LLVM_OGCG_CIR: [[PG_ADDR:%.*]] = alloca <vscale x 16 x i1>,{{([[:space:]]?i64 1,)?}} align 2 +// LLVM_OGCG_CIR: [[OP_ADDR:%.*]] = alloca half,{{([[:space:]]?i64 1,)?}} align 2 +// +// LLVM_VIA_CIR: [[RES_ADDR:%.*]] = alloca <vscale x 8 x half>,{{([[:space:]]?i64 1,)?}} align 16 +// +// LLVM_OGCG_CIR: store <vscale x 16 x i1> [[PG]], ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: store half [[OP]], ptr [[OP_ADDR]], align 2 +// LLVM_OGCG_CIR: [[TMP0:%.*]] = load <vscale x 16 x i1>, ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: [[TMP1:%.*]] = load half, ptr [[OP_ADDR]], align 2 +// LLVM_OGCG_CIR: [[TMP2:%.*]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[TMP0]]) +// LLVM_OGCG_CIR: [[TMP3:%.*]] = call <vscale x 8 x half> @llvm.aarch64.sve.dup.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x i1> [[TMP2]], half [[TMP1]]) +// +// LLVM_DIRECT: ret {{.*}} [[TMP3]] +// +// LLVM_VIA_CIR: store {{.*}} [[TMP3]], ptr [[RES_ADDR]] +// LLVM_VIA_CIR: [[RES:%.*]] = load {{.*}} [[RES_ADDR]] +// LLVM_VIA_CIR: ret {{.*}} [[RES]] + return SVE_ACLE_FUNC(svdup,_n,_f16_z,)(pg, op); +} + +// ALL-LABEL: @test_svdup_n_f32_z( +svfloat32_t test_svdup_n_f32_z(svbool_t pg, float32_t op) MODE_ATTR +{ +// CIR-SAME: %[[PG:.*]]: !cir.vector<[16] x !cir.int<u, 1>> +// CIR-SAME: %[[OP:.*]]: !cir.float +// CIR-SAME: -> !cir.vector<[4] x !cir.float> +// CIR: %[[ALLOCA_PG:.*]] = cir.alloca !cir.vector<[16] x !cir.int<u, 1>> +// CIR: %[[ALLOCA_OP:.*]] = cir.alloca !cir.float +// CIR: %[[ALLOCA_RES:.*]] = cir.alloca !cir.vector<[4] x !cir.float> +// CIR: cir.store %[[PG]], %[[ALLOCA_PG]] +// CIR: cir.store %[[OP]], %[[ALLOCA_OP]] +// CIR: %[[LOAD_PG:.*]] = cir.load align(2) %[[ALLOCA_PG]] +// CIR: %[[LOAD_OP:.*]] = cir.load align(4) %[[ALLOCA_OP]] +// CIR: %[[CONST_0:.*]] = cir.const #cir.zero : !cir.vector<[4] x !cir.float> +// CIR: %[[CONVERT_PG:.*]] = cir.call_llvm_intrinsic "aarch64.sve.convert.from.svbool" %[[LOAD_PG]] +// CIR-SAME: -> !cir.vector<[4] x !cir.int<u, 1>> +// CIR: %[[CALL_DUP:.*]] = cir.call_llvm_intrinsic "aarch64.sve.dup" %[[CONST_0]], %[[CONVERT_PG]], %[[LOAD_OP]] +// CIR-SAME: -> !cir.vector<[4] x !cir.float> +// CIR: cir.store %[[CALL_DUP]], %[[ALLOCA_RES]] +// CIR: %[[RES:.*]] = cir.load %[[ALLOCA_RES]] +// CIR: cir.return %[[RES]] + +// LLVM_OGCG_CIR-SAME: <vscale x 16 x i1> [[PG:%.*]], float {{(noundef)?[[:space:]]?}}[[OP:%.*]]) +// LLVM_OGCG_CIR: [[PG_ADDR:%.*]] = alloca <vscale x 16 x i1>,{{([[:space:]]?i64 1,)?}} align 2 +// LLVM_OGCG_CIR: [[OP_ADDR:%.*]] = alloca float,{{([[:space:]]?i64 1,)?}} align 4 +// +// LLVM_VIA_CIR: [[RES_ADDR:%.*]] = alloca <vscale x 4 x float>,{{([[:space:]]?i64 1,)?}} align 16 +// +// LLVM_OGCG_CIR: store <vscale x 16 x i1> [[PG]], ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: store float [[OP]], ptr [[OP_ADDR]], align 4 +// LLVM_OGCG_CIR: [[TMP0:%.*]] = load <vscale x 16 x i1>, ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: [[TMP1:%.*]] = load float, ptr [[OP_ADDR]], align 4 +// LLVM_OGCG_CIR: [[TMP2:%.*]] = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[TMP0]]) +// LLVM_OGCG_CIR: [[TMP3:%.*]] = call <vscale x 4 x float> @llvm.aarch64.sve.dup.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x i1> [[TMP2]], float [[TMP1]]) +// +// LLVM_DIRECT: ret {{.*}} [[TMP3]] +// +// LLVM_VIA_CIR: store {{.*}} [[TMP3]], ptr [[RES_ADDR]] +// LLVM_VIA_CIR: [[RES:%.*]] = load {{.*}} [[RES_ADDR]] +// LLVM_VIA_CIR: ret {{.*}} [[RES]] + return SVE_ACLE_FUNC(svdup,_n,_f32_z,)(pg, op); +} + +// ALL-LABEL: @test_svdup_n_f64_z( +svfloat64_t test_svdup_n_f64_z(svbool_t pg, float64_t op) MODE_ATTR +{ +// CIR-SAME: %[[PG:.*]]: !cir.vector<[16] x !cir.int<u, 1>> +// CIR-SAME: %[[OP:.*]]: !cir.double +// CIR-SAME: -> !cir.vector<[2] x !cir.double> +// CIR: %[[ALLOCA_PG:.*]] = cir.alloca !cir.vector<[16] x !cir.int<u, 1>> +// CIR: %[[ALLOCA_OP:.*]] = cir.alloca !cir.double +// CIR: %[[ALLOCA_RES:.*]] = cir.alloca !cir.vector<[2] x !cir.double> +// CIR: cir.store %[[PG]], %[[ALLOCA_PG]] +// CIR: cir.store %[[OP]], %[[ALLOCA_OP]] +// CIR: %[[LOAD_PG:.*]] = cir.load align(2) %[[ALLOCA_PG]] +// CIR: %[[LOAD_OP:.*]] = cir.load align(8) %[[ALLOCA_OP]] +// CIR: %[[CONST_0:.*]] = cir.const #cir.zero : !cir.vector<[2] x !cir.double> +// CIR: %[[CONVERT_PG:.*]] = cir.call_llvm_intrinsic "aarch64.sve.convert.from.svbool" %[[LOAD_PG]] +// CIR-SAME: -> !cir.vector<[2] x !cir.int<u, 1>> +// CIR: %[[CALL_DUP:.*]] = cir.call_llvm_intrinsic "aarch64.sve.dup" %[[CONST_0]], %[[CONVERT_PG]], %[[LOAD_OP]] +// CIR-SAME: -> !cir.vector<[2] x !cir.double> +// CIR: cir.store %[[CALL_DUP]], %[[ALLOCA_RES]] +// CIR: %[[RES:.*]] = cir.load %[[ALLOCA_RES]] +// CIR: cir.return %[[RES]] + +// LLVM_OGCG_CIR-SAME: <vscale x 16 x i1> [[PG:%.*]], double {{(noundef)?[[:space:]]?}}[[OP:%.*]]) +// LLVM_OGCG_CIR: [[PG_ADDR:%.*]] = alloca <vscale x 16 x i1>,{{([[:space:]]?i64 1,)?}} align 2 +// LLVM_OGCG_CIR: [[OP_ADDR:%.*]] = alloca double,{{([[:space:]]?i64 1,)?}} align 8 +// +// LLVM_VIA_CIR: [[RES_ADDR:%.*]] = alloca <vscale x 2 x double>,{{([[:space:]]?i64 1,)?}} align 16 +// +// LLVM_OGCG_CIR: store <vscale x 16 x i1> [[PG]], ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: store double [[OP]], ptr [[OP_ADDR]], align 8 +// LLVM_OGCG_CIR: [[TMP0:%.*]] = load <vscale x 16 x i1>, ptr [[PG_ADDR]], align 2 +// LLVM_OGCG_CIR: [[TMP1:%.*]] = load double, ptr [[OP_ADDR]], align 8 +// LLVM_OGCG_CIR: [[TMP2:%.*]] = call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[TMP0]]) +// LLVM_OGCG_CIR: [[TMP3:%.*]] = call <vscale x 2 x double> @llvm.aarch64.sve.dup.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x i1> [[TMP2]], double [[TMP1]]) +// +// LLVM_DIRECT: ret {{.*}} [[TMP3]] +// +// LLVM_VIA_CIR: store {{.*}} [[TMP3]], ptr [[RES_ADDR]] +// LLVM_VIA_CIR: [[RES:%.*]] = load {{.*}} [[RES_ADDR]] +// LLVM_VIA_CIR: ret {{.*}} [[RES]] + return SVE_ACLE_FUNC(svdup,_n,_f64_z,)(pg, op); +} _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
