Author: Benjamin Maxwell Date: 2026-02-25T10:10:59Z New Revision: ce952a224cbb51e7b081958e57899101324e4212
URL: https://github.com/llvm/llvm-project/commit/ce952a224cbb51e7b081958e57899101324e4212 DIFF: https://github.com/llvm/llvm-project/commit/ce952a224cbb51e7b081958e57899101324e4212.diff LOG: [Clang] Add `__builtin_reduce_[in_order|assoc]_fadd` for floating-point reductions (#176160) This adds `__builtin_reduce_[in_order|assoc]_fadd` to expose the `llvm.vector.reduce.fadd.*` intrinsic directly in Clang, for the full range of supported FP types. Given a floating-point vector `vec` and a scalar floating-point value `acc`: - `__builtin_reduce_assoc_fadd(vec)` corresponds to an fast/associative reduction * i.e, the fadds can occur in any order - `__builtin_reduce_in_order_fadd(vec, acc)` corresponds to an ordered redunction * i.e, the result is as-if an accumulator was initialized with `acc` and each lane was added to it in-order, starting from lane 0 Added: Modified: clang/docs/LanguageExtensions.rst clang/include/clang/Basic/Builtins.td clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp clang/lib/CodeGen/CGBuiltin.cpp clang/lib/Sema/SemaChecking.cpp clang/test/CodeGen/builtins-reduction-math.c clang/test/Sema/builtins-reduction-math.c Removed: ################################################################################ diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst index 5ac15dd80760b..72cbf0610a2b8 100644 --- a/clang/docs/LanguageExtensions.rst +++ b/clang/docs/LanguageExtensions.rst @@ -926,27 +926,31 @@ Example: Let ``VT`` be a vector type and ``ET`` the element type of ``VT``. -======================================= ====================================================================== ================================== - Name Operation Supported element types -======================================= ====================================================================== ================================== - ET __builtin_reduce_max(VT a) return the largest element of the vector. The floating point result integer and floating point types - will always be a number unless all elements of the vector are NaN. - ET __builtin_reduce_min(VT a) return the smallest element of the vector. The floating point result integer and floating point types - will always be a number unless all elements of the vector are NaN. - ET __builtin_reduce_add(VT a) \+ integer types - ET __builtin_reduce_mul(VT a) \* integer types - ET __builtin_reduce_and(VT a) & integer types - ET __builtin_reduce_or(VT a) \| integer types - ET __builtin_reduce_xor(VT a) ^ integer types - ET __builtin_reduce_maximum(VT a) return the largest element of the vector. Follows IEEE 754-2019 floating point types - semantics, see `LangRef - <http://llvm.org/docs/LangRef.html#i-fminmax-family>`_ - for the comparison. - ET __builtin_reduce_minimum(VT a) return the smallest element of the vector. Follows IEEE 754-2019 floating point types - semantics, see `LangRef - <http://llvm.org/docs/LangRef.html#i-fminmax-family>`_ - for the comparison. -======================================= ====================================================================== ================================== +============================================== ====================================================================== ================================== + Name Operation Supported element types +============================================== ====================================================================== ================================== + ET __builtin_reduce_max(VT a) return the largest element of the vector. The floating point result integer and floating point types + will always be a number unless all elements of the vector are NaN. + ET __builtin_reduce_min(VT a) return the smallest element of the vector. The floating point result integer and floating point types + will always be a number unless all elements of the vector are NaN. + ET __builtin_reduce_add(VT a) \+ integer types + ET __builtin_reduce_mul(VT a) \* integer types + ET __builtin_reduce_and(VT a) & integer types + ET __builtin_reduce_or(VT a) \| integer types + ET __builtin_reduce_xor(VT a) ^ integer types + ET __builtin_reduce_maximum(VT a) return the largest element of the vector. Follows IEEE 754-2019 floating point types + semantics, see `LangRef + <http://llvm.org/docs/LangRef.html#i-fminmax-family>`_ + for the comparison. + ET __builtin_reduce_minimum(VT a) return the smallest element of the vector. Follows IEEE 754-2019 floating point types + semantics, see `LangRef + <http://llvm.org/docs/LangRef.html#i-fminmax-family>`_ + for the comparison. + ET __builtin_reduce_assoc_fadd(VT a[, ET s]) associative floating-point add reduction. floating point types + ET __builtin_reduce_in_order_fadd(VT a, ET s) in order floating-point add reduction, initializing the accumulator floating point types + with `(ET)s`, then adding each lane of the `a` in-order, starting + from lane 0. The additions cannot be reassociated. +============================================== ====================================================================== ================================== *Masked Builtins* @@ -975,15 +979,15 @@ Example: using v8i = int [[clang::ext_vector_type(8)]]; v8i load(v8b mask, int *ptr) { return __builtin_masked_load(mask, ptr); } - + v8i load_expand(v8b mask, int *ptr) { return __builtin_masked_expand_load(mask, ptr); } - + void store(v8b mask, v8i val, int *ptr) { __builtin_masked_store(mask, val, ptr); } - + void store_compress(v8b mask, v8i val, int *ptr) { __builtin_masked_compress_store(mask, val, ptr); } @@ -1075,7 +1079,7 @@ The matrix type extension supports explicit casts. Implicit type conversion betw The matrix type extension supports column and row major memory layouts, but not all builtins are supported with row-major layout. The layout defaults to column -major and can be specified using `-fmatrix-memory-layout`. To enable column +major and can be specified using `-fmatrix-memory-layout`. To enable column major layout, use `-fmatrix-memory-layout=column-major`, and for row major layout use `-fmatrix-memory-layout=row-major` diff --git a/clang/include/clang/Basic/Builtins.td b/clang/include/clang/Basic/Builtins.td index 78dd26aa2c455..531c3702161f2 100644 --- a/clang/include/clang/Basic/Builtins.td +++ b/clang/include/clang/Basic/Builtins.td @@ -1664,6 +1664,18 @@ def ReduceAdd : Builtin { let Prototype = "void(...)"; } +def ReduceInOrderFAdd : Builtin { + let Spellings = ["__builtin_reduce_in_order_fadd"]; + let Attributes = [NoThrow, Const, CustomTypeChecking]; + let Prototype = "void(...)"; +} + +def ReduceAssocFAdd : Builtin { + let Spellings = ["__builtin_reduce_assoc_fadd"]; + let Attributes = [NoThrow, Const, CustomTypeChecking]; + let Prototype = "void(...)"; +} + def ReduceMul : Builtin { let Spellings = ["__builtin_reduce_mul"]; let Attributes = [NoThrow, Const, CustomTypeChecking, Constexpr]; diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp b/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp index 50e6892f4bbc5..a27e66e0989fa 100644 --- a/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp +++ b/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp @@ -1528,6 +1528,8 @@ RValue CIRGenFunction::emitBuiltinExpr(const GlobalDecl &gd, unsigned builtinID, case Builtin::BI__builtin_reduce_xor: case Builtin::BI__builtin_reduce_or: case Builtin::BI__builtin_reduce_and: + case Builtin::BI__builtin_reduce_assoc_fadd: + case Builtin::BI__builtin_reduce_in_order_fadd: case Builtin::BI__builtin_reduce_maximum: case Builtin::BI__builtin_reduce_minimum: case Builtin::BI__builtin_matrix_transpose: diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 850cc8d2c4c45..38010cad75244 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -4215,6 +4215,29 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID, case Builtin::BI__builtin_reduce_minimum: return RValue::get(emitBuiltinWithOneOverloadedType<1>( *this, E, Intrinsic::vector_reduce_fminimum, "rdx.minimum")); + case Builtin::BI__builtin_reduce_assoc_fadd: + case Builtin::BI__builtin_reduce_in_order_fadd: { + llvm::Value *Vector = EmitScalarExpr(E->getArg(0)); + llvm::Type *ScalarTy = Vector->getType()->getScalarType(); + llvm::Value *StartValue = nullptr; + if (E->getNumArgs() == 2) + StartValue = Builder.CreateFPCast(EmitScalarExpr(E->getArg(1)), ScalarTy); + llvm::Value *Args[] = {/*start_value=*/StartValue + ? StartValue + : llvm::ConstantFP::get(ScalarTy, -0.0F), + /*vector=*/Vector}; + llvm::Function *F = + CGM.getIntrinsic(Intrinsic::vector_reduce_fadd, Vector->getType()); + llvm::CallBase *Reduce = Builder.CreateCall(F, Args, "rdx.addf"); + if (BuiltinIDIfNoAsmLabel == Builtin::BI__builtin_reduce_assoc_fadd) { + // `__builtin_reduce_assoc_fadd` is an associative reduction which + // requires the reassoc FMF flag. + llvm::FastMathFlags FMF; + FMF.setAllowReassoc(); + cast<llvm::CallBase>(Reduce)->setFastMathFlags(FMF); + } + return RValue::get(Reduce); + } case Builtin::BI__builtin_matrix_transpose: { auto *MatrixTy = E->getArg(0)->getType()->castAs<ConstantMatrixType>(); diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp index 0ea41ff1f613e..45dce52179f82 100644 --- a/clang/lib/Sema/SemaChecking.cpp +++ b/clang/lib/Sema/SemaChecking.cpp @@ -2823,6 +2823,14 @@ static ExprResult BuiltinVectorMathConversions(Sema &S, Expr *E) { return S.UsualUnaryFPConversions(Res.get()); } +static QualType getVectorElementType(ASTContext &Context, QualType VecTy) { + if (const auto *TyA = VecTy->getAs<VectorType>()) + return TyA->getElementType(); + if (VecTy->isSizelessVectorType()) + return VecTy->getSizelessVectorEltType(Context); + return QualType(); +} + ExprResult Sema::CheckBuiltinFunctionCall(FunctionDecl *FDecl, unsigned BuiltinID, CallExpr *TheCall) { @@ -3673,14 +3681,8 @@ Sema::CheckBuiltinFunctionCall(FunctionDecl *FDecl, unsigned BuiltinID, return ExprError(); const Expr *Arg = TheCall->getArg(0); - const auto *TyA = Arg->getType()->getAs<VectorType>(); - - QualType ElTy; - if (TyA) - ElTy = TyA->getElementType(); - else if (Arg->getType()->isSizelessVectorType()) - ElTy = Arg->getType()->getSizelessVectorEltType(Context); + QualType ElTy = getVectorElementType(Context, Arg->getType()); if (ElTy.isNull() || !ElTy->isIntegerType()) { Diag(Arg->getBeginLoc(), diag::err_builtin_invalid_arg_type) << 1 << /* vector of */ 4 << /* int */ 1 << /* no fp */ 0 @@ -3692,6 +3694,46 @@ Sema::CheckBuiltinFunctionCall(FunctionDecl *FDecl, unsigned BuiltinID, break; } + case Builtin::BI__builtin_reduce_assoc_fadd: + case Builtin::BI__builtin_reduce_in_order_fadd: { + // For in-order reductions require the user to specify the start value. + bool InOrder = BuiltinID == Builtin::BI__builtin_reduce_in_order_fadd; + if (InOrder ? checkArgCount(TheCall, 2) : checkArgCountRange(TheCall, 1, 2)) + return ExprError(); + + ExprResult Vec = UsualUnaryConversions(TheCall->getArg(0)); + if (Vec.isInvalid()) + return ExprError(); + + TheCall->setArg(0, Vec.get()); + + QualType ElTy = getVectorElementType(Context, Vec.get()->getType()); + if (ElTy.isNull() || !ElTy->isRealFloatingType()) { + Diag(Vec.get()->getBeginLoc(), diag::err_builtin_invalid_arg_type) + << 1 << /* vector of */ 4 << /* no int */ 0 << /* fp */ 1 + << Vec.get()->getType(); + return ExprError(); + } + + if (TheCall->getNumArgs() == 2) { + ExprResult StartValue = UsualUnaryConversions(TheCall->getArg(1)); + if (StartValue.isInvalid()) + return ExprError(); + + if (!StartValue.get()->getType()->isRealFloatingType()) { + Diag(StartValue.get()->getBeginLoc(), + diag::err_builtin_invalid_arg_type) + << 2 << /* scalar */ 1 << /* no int */ 0 << /* fp */ 1 + << StartValue.get()->getType(); + return ExprError(); + } + TheCall->setArg(1, StartValue.get()); + } + + TheCall->setType(ElTy); + break; + } + case Builtin::BI__builtin_matrix_transpose: return BuiltinMatrixTranspose(TheCall, TheCallResult); diff --git a/clang/test/CodeGen/builtins-reduction-math.c b/clang/test/CodeGen/builtins-reduction-math.c index e12fd729c84c0..187f42068905a 100644 --- a/clang/test/CodeGen/builtins-reduction-math.c +++ b/clang/test/CodeGen/builtins-reduction-math.c @@ -4,6 +4,8 @@ // RUN: %clang_cc1 -O1 -triple aarch64 -target-feature +sve %s -emit-llvm -disable-llvm-passes -o - | FileCheck --check-prefixes=SVE %s typedef float float4 __attribute__((ext_vector_type(4))); +typedef _Float16 half8 __attribute__((ext_vector_type(8))); + typedef short int si8 __attribute__((ext_vector_type(8))); typedef unsigned int u4 __attribute__((ext_vector_type(4))); @@ -162,6 +164,37 @@ void test_builtin_reduce_minimum(float4 vf1) { const double r4 = __builtin_reduce_minimum(vf1_as_one); } +void test_builtin_reduce_addf(float4 vf1, half8 vf2, float start) { + // CHECK-LABEL: define void @test_builtin_reduce_addf( + + // CHECK: [[V0:%.+]] = load <4 x float>, ptr %vf1.addr, align 16 + // CHECK-NEXT: call reassoc float @llvm.vector.reduce.fadd.v4f32(float 1.000000e+00, <4 x float> [[V0]]) + float r1 = __builtin_reduce_assoc_fadd(vf1, 1.0f); + + // CHECK: [[V1:%.+]] = load <4 x float>, ptr %vf1.addr, align 16 + // CHECK-NEXT: call float @llvm.vector.reduce.fadd.v4f32(float 2.000000e+00, <4 x float> [[V1]]) + float r2 = __builtin_reduce_in_order_fadd(vf1, 2.0f); + + // CHECK: [[V2:%.+]] = load <8 x half>, ptr %vf2.addr, align 16 + // CHECK-NEXT: call reassoc half @llvm.vector.reduce.fadd.v8f16(half 0xH8000, <8 x half> [[V2:%.+]]) + _Float16 r3 = __builtin_reduce_assoc_fadd(vf2); + + // CHECK: [[V3:%.+]] = load <8 x half>, ptr %vf2.addr, align 16 + // CHECK-NEXT: [[RDX:%.+]] = call half @llvm.vector.reduce.fadd.v8f16(half 0xH8000, <8 x half> [[V3]]) + // CHECK-NEXT: fpext half [[RDX]] to float + float r4 = __builtin_reduce_in_order_fadd(vf2, -0.0f); + + // CHECK: [[V4:%.+]] = load <4 x float>, ptr %vf1.addr, align 16 + // CHECK: [[START0:%.+]] = load float, ptr %start.addr, align 4 + // CHECK-NEXT: call float @llvm.vector.reduce.fadd.v4f32(float [[START0]], <4 x float> [[V4]]) + float r5 = __builtin_reduce_in_order_fadd(vf1, start); + + // CHECK: [[V5:%.+]] = load <8 x half>, ptr %vf2.addr, align 16 + // CHECK: [[START1:%.+]] = fptrunc float %{{.*}} to half + // CHECK-NEXT: call reassoc half @llvm.vector.reduce.fadd.v8f16(half [[START1]], <8 x half> [[V5:%.+]]) + _Float16 r7 = __builtin_reduce_assoc_fadd(vf2, start); +} + #if defined(__ARM_FEATURE_SVE) #include <arm_sve.h> diff --git a/clang/test/Sema/builtins-reduction-math.c b/clang/test/Sema/builtins-reduction-math.c index 74f09d501198b..5270de644356e 100644 --- a/clang/test/Sema/builtins-reduction-math.c +++ b/clang/test/Sema/builtins-reduction-math.c @@ -148,3 +148,23 @@ void test_builtin_reduce_minimum(int i, float4 v, int3 iv) { i = __builtin_reduce_minimum(i); // expected-error@-1 {{1st argument must be a vector of floating-point types (was 'int')}} } + +void test_builtin_reduce_addf(float f, float4 v, int3 iv) { + struct Foo s = __builtin_reduce_assoc_fadd(v); + // expected-error@-1 {{initializing 'struct Foo' with an expression of incompatible type 'float'}} + + f = __builtin_reduce_in_order_fadd(v); + // expected-error@-1 {{too few arguments to function call, expected 2, have 1}} + + f = __builtin_reduce_in_order_fadd(v, f, f); + // expected-error@-1 {{too many arguments to function call, expected 2, have 3}} + + f = __builtin_reduce_assoc_fadd(); + // expected-error@-1 {{too few arguments to function call, expected 1, have 0}} + + f = __builtin_reduce_assoc_fadd(iv); + // expected-error@-1 {{1st argument must be a vector of floating-point types (was 'int3' (vector of 3 'int' values))}} + + f = __builtin_reduce_in_order_fadd(v, (int)121); + // expected-error@-1 {{2nd argument must be a scalar floating-point type (was 'int')}} +} _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
