[clang] ce952a2 - [Clang] Add `__builtin_reduce_[in_order|assoc]_fadd` for floating-point reductions (#176160)

via cfe-commits Wed, 25 Feb 2026 02:11:09 -0800

Author: Benjamin Maxwell
Date: 2026-02-25T10:10:59Z
New Revision: ce952a224cbb51e7b081958e57899101324e4212


URL: 
https://github.com/llvm/llvm-project/commit/ce952a224cbb51e7b081958e57899101324e4212
DIFF: 
https://github.com/llvm/llvm-project/commit/ce952a224cbb51e7b081958e57899101324e4212.diff

LOG: [Clang] Add `__builtin_reduce_[in_order|assoc]_fadd` for floating-point 
reductions (#176160)

This adds `__builtin_reduce_[in_order|assoc]_fadd` to expose the
`llvm.vector.reduce.fadd.*` intrinsic directly in Clang, for the full
range of supported FP types.

Given a floating-point vector `vec` and a scalar floating-point value
`acc`:

- `__builtin_reduce_assoc_fadd(vec)` corresponds to an fast/associative
  reduction
  * i.e, the fadds can occur in any order
- `__builtin_reduce_in_order_fadd(vec, acc)` corresponds to an ordered
  redunction
  * i.e, the result is as-if an accumulator was initialized with `acc` 
    and each lane was added to it in-order, starting from lane 0

Added: 
    

Modified: 
    clang/docs/LanguageExtensions.rst
    clang/include/clang/Basic/Builtins.td
    clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
    clang/lib/CodeGen/CGBuiltin.cpp
    clang/lib/Sema/SemaChecking.cpp
    clang/test/CodeGen/builtins-reduction-math.c
    clang/test/Sema/builtins-reduction-math.c

Removed: 
    


################################################################################
diff  --git a/clang/docs/LanguageExtensions.rst 
b/clang/docs/LanguageExtensions.rst
index 5ac15dd80760b..72cbf0610a2b8 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -926,27 +926,31 @@ Example:
 
 Let ``VT`` be a vector type and ``ET`` the element type of ``VT``.
 
-======================================= 
====================================================================== 
==================================
-         Name                            Operation                             
                                 Supported element types
-======================================= 
====================================================================== 
==================================
- ET __builtin_reduce_max(VT a)           return the largest element of the 
vector. The floating point result    integer and floating point types
-                                         will always be a number unless all 
elements of the vector are NaN.
- ET __builtin_reduce_min(VT a)           return the smallest element of the 
vector. The floating point result   integer and floating point types
-                                         will always be a number unless all 
elements of the vector are NaN.
- ET __builtin_reduce_add(VT a)           \+                                    
                                 integer types
- ET __builtin_reduce_mul(VT a)           \*                                    
                                 integer types
- ET __builtin_reduce_and(VT a)           &                                     
                                 integer types
- ET __builtin_reduce_or(VT a)            \|                                    
                                 integer types
- ET __builtin_reduce_xor(VT a)           ^                                     
                                 integer types
- ET __builtin_reduce_maximum(VT a)       return the largest element of the 
vector. Follows IEEE 754-2019        floating point types
-                                         semantics, see `LangRef
-                                         
<http://llvm.org/docs/LangRef.html#i-fminmax-family>`_
-                                         for the comparison.
- ET __builtin_reduce_minimum(VT a)       return the smallest element of the 
vector. Follows IEEE 754-2019       floating point types
-                                         semantics, see `LangRef
-                                         
<http://llvm.org/docs/LangRef.html#i-fminmax-family>`_
-                                         for the comparison.
-======================================= 
====================================================================== 
==================================
+============================================== 
====================================================================== 
==================================
+         Name                                   Operation                      
                                        Supported element types
+============================================== 
====================================================================== 
==================================
+ ET __builtin_reduce_max(VT a)                  return the largest element of 
the vector. The floating point result    integer and floating point types
+                                                will always be a number unless 
all elements of the vector are NaN.
+ ET __builtin_reduce_min(VT a)                  return the smallest element of 
the vector. The floating point result   integer and floating point types
+                                                will always be a number unless 
all elements of the vector are NaN.
+ ET __builtin_reduce_add(VT a)                  \+                             
                                        integer types
+ ET __builtin_reduce_mul(VT a)                  \*                             
                                        integer types
+ ET __builtin_reduce_and(VT a)                  &                              
                                        integer types
+ ET __builtin_reduce_or(VT a)                   \|                             
                                        integer types
+ ET __builtin_reduce_xor(VT a)                  ^                              
                                        integer types
+ ET __builtin_reduce_maximum(VT a)              return the largest element of 
the vector. Follows IEEE 754-2019        floating point types
+                                                semantics, see `LangRef
+                                                
<http://llvm.org/docs/LangRef.html#i-fminmax-family>`_
+                                                for the comparison.
+ ET __builtin_reduce_minimum(VT a)              return the smallest element of 
the vector. Follows IEEE 754-2019       floating point types
+                                                semantics, see `LangRef
+                                                
<http://llvm.org/docs/LangRef.html#i-fminmax-family>`_
+                                                for the comparison.
+ ET __builtin_reduce_assoc_fadd(VT a[, ET s])   associative floating-point add 
reduction.                              floating point types
+ ET __builtin_reduce_in_order_fadd(VT a, ET s)  in order floating-point add 
reduction, initializing the accumulator    floating point types
+                                                with `(ET)s`, then adding each 
lane of the `a` in-order, starting
+                                                from lane 0. The additions 
cannot be reassociated.
+============================================== 
====================================================================== 
==================================
 
 *Masked Builtins*
 
@@ -975,15 +979,15 @@ Example:
     using v8i = int [[clang::ext_vector_type(8)]];
 
     v8i load(v8b mask, int *ptr) { return __builtin_masked_load(mask, ptr); }
-    
+
     v8i load_expand(v8b mask, int *ptr) {
       return __builtin_masked_expand_load(mask, ptr);
     }
-    
+
     void store(v8b mask, v8i val, int *ptr) {
       __builtin_masked_store(mask, val, ptr);
     }
-    
+
     void store_compress(v8b mask, v8i val, int *ptr) {
       __builtin_masked_compress_store(mask, val, ptr);
     }
@@ -1075,7 +1079,7 @@ The matrix type extension supports explicit casts. 
Implicit type conversion betw
 
 The matrix type extension supports column and row major memory layouts, but not
 all builtins are supported with row-major layout. The layout defaults to column
-major and can be specified using `-fmatrix-memory-layout`. To enable column 
+major and can be specified using `-fmatrix-memory-layout`. To enable column
 major layout, use `-fmatrix-memory-layout=column-major`, and for row major
 layout use `-fmatrix-memory-layout=row-major`
 

diff  --git a/clang/include/clang/Basic/Builtins.td 
b/clang/include/clang/Basic/Builtins.td
index 78dd26aa2c455..531c3702161f2 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -1664,6 +1664,18 @@ def ReduceAdd : Builtin {
   let Prototype = "void(...)";
 }
 
+def ReduceInOrderFAdd : Builtin {
+  let Spellings = ["__builtin_reduce_in_order_fadd"];
+  let Attributes = [NoThrow, Const, CustomTypeChecking];
+  let Prototype = "void(...)";
+}
+
+def ReduceAssocFAdd : Builtin {
+  let Spellings = ["__builtin_reduce_assoc_fadd"];
+  let Attributes = [NoThrow, Const, CustomTypeChecking];
+  let Prototype = "void(...)";
+}
+
 def ReduceMul : Builtin {
   let Spellings = ["__builtin_reduce_mul"];
   let Attributes = [NoThrow, Const, CustomTypeChecking, Constexpr];

diff  --git a/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp 
b/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
index 50e6892f4bbc5..a27e66e0989fa 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
@@ -1528,6 +1528,8 @@ RValue CIRGenFunction::emitBuiltinExpr(const GlobalDecl 
&gd, unsigned builtinID,
   case Builtin::BI__builtin_reduce_xor:
   case Builtin::BI__builtin_reduce_or:
   case Builtin::BI__builtin_reduce_and:
+  case Builtin::BI__builtin_reduce_assoc_fadd:
+  case Builtin::BI__builtin_reduce_in_order_fadd:
   case Builtin::BI__builtin_reduce_maximum:
   case Builtin::BI__builtin_reduce_minimum:
   case Builtin::BI__builtin_matrix_transpose:

diff  --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 850cc8d2c4c45..38010cad75244 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -4215,6 +4215,29 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
   case Builtin::BI__builtin_reduce_minimum:
     return RValue::get(emitBuiltinWithOneOverloadedType<1>(
         *this, E, Intrinsic::vector_reduce_fminimum, "rdx.minimum"));
+  case Builtin::BI__builtin_reduce_assoc_fadd:
+  case Builtin::BI__builtin_reduce_in_order_fadd: {
+    llvm::Value *Vector = EmitScalarExpr(E->getArg(0));
+    llvm::Type *ScalarTy = Vector->getType()->getScalarType();
+    llvm::Value *StartValue = nullptr;
+    if (E->getNumArgs() == 2)
+      StartValue = Builder.CreateFPCast(EmitScalarExpr(E->getArg(1)), 
ScalarTy);
+    llvm::Value *Args[] = {/*start_value=*/StartValue
+                               ? StartValue
+                               : llvm::ConstantFP::get(ScalarTy, -0.0F),
+                           /*vector=*/Vector};
+    llvm::Function *F =
+        CGM.getIntrinsic(Intrinsic::vector_reduce_fadd, Vector->getType());
+    llvm::CallBase *Reduce = Builder.CreateCall(F, Args, "rdx.addf");
+    if (BuiltinIDIfNoAsmLabel == Builtin::BI__builtin_reduce_assoc_fadd) {
+      // `__builtin_reduce_assoc_fadd` is an associative reduction which
+      // requires the reassoc FMF flag.
+      llvm::FastMathFlags FMF;
+      FMF.setAllowReassoc();
+      cast<llvm::CallBase>(Reduce)->setFastMathFlags(FMF);
+    }
+    return RValue::get(Reduce);
+  }
 
   case Builtin::BI__builtin_matrix_transpose: {
     auto *MatrixTy = E->getArg(0)->getType()->castAs<ConstantMatrixType>();

diff  --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index 0ea41ff1f613e..45dce52179f82 100644
--- a/clang/lib/Sema/SemaChecking.cpp
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -2823,6 +2823,14 @@ static ExprResult BuiltinVectorMathConversions(Sema &S, 
Expr *E) {
   return S.UsualUnaryFPConversions(Res.get());
 }
 
+static QualType getVectorElementType(ASTContext &Context, QualType VecTy) {
+  if (const auto *TyA = VecTy->getAs<VectorType>())
+    return TyA->getElementType();
+  if (VecTy->isSizelessVectorType())
+    return VecTy->getSizelessVectorEltType(Context);
+  return QualType();
+}
+
 ExprResult
 Sema::CheckBuiltinFunctionCall(FunctionDecl *FDecl, unsigned BuiltinID,
                                CallExpr *TheCall) {
@@ -3673,14 +3681,8 @@ Sema::CheckBuiltinFunctionCall(FunctionDecl *FDecl, 
unsigned BuiltinID,
       return ExprError();
 
     const Expr *Arg = TheCall->getArg(0);
-    const auto *TyA = Arg->getType()->getAs<VectorType>();
-
-    QualType ElTy;
-    if (TyA)
-      ElTy = TyA->getElementType();
-    else if (Arg->getType()->isSizelessVectorType())
-      ElTy = Arg->getType()->getSizelessVectorEltType(Context);
 
+    QualType ElTy = getVectorElementType(Context, Arg->getType());
     if (ElTy.isNull() || !ElTy->isIntegerType()) {
       Diag(Arg->getBeginLoc(), diag::err_builtin_invalid_arg_type)
           << 1 << /* vector of */ 4 << /* int */ 1 << /* no fp */ 0
@@ -3692,6 +3694,46 @@ Sema::CheckBuiltinFunctionCall(FunctionDecl *FDecl, 
unsigned BuiltinID,
     break;
   }
 
+  case Builtin::BI__builtin_reduce_assoc_fadd:
+  case Builtin::BI__builtin_reduce_in_order_fadd: {
+    // For in-order reductions require the user to specify the start value.
+    bool InOrder = BuiltinID == Builtin::BI__builtin_reduce_in_order_fadd;
+    if (InOrder ? checkArgCount(TheCall, 2) : checkArgCountRange(TheCall, 1, 
2))
+      return ExprError();
+
+    ExprResult Vec = UsualUnaryConversions(TheCall->getArg(0));
+    if (Vec.isInvalid())
+      return ExprError();
+
+    TheCall->setArg(0, Vec.get());
+
+    QualType ElTy = getVectorElementType(Context, Vec.get()->getType());
+    if (ElTy.isNull() || !ElTy->isRealFloatingType()) {
+      Diag(Vec.get()->getBeginLoc(), diag::err_builtin_invalid_arg_type)
+          << 1 << /* vector of */ 4 << /* no int */ 0 << /* fp */ 1
+          << Vec.get()->getType();
+      return ExprError();
+    }
+
+    if (TheCall->getNumArgs() == 2) {
+      ExprResult StartValue = UsualUnaryConversions(TheCall->getArg(1));
+      if (StartValue.isInvalid())
+        return ExprError();
+
+      if (!StartValue.get()->getType()->isRealFloatingType()) {
+        Diag(StartValue.get()->getBeginLoc(),
+             diag::err_builtin_invalid_arg_type)
+            << 2 << /* scalar */ 1 << /* no int */ 0 << /* fp */ 1
+            << StartValue.get()->getType();
+        return ExprError();
+      }
+      TheCall->setArg(1, StartValue.get());
+    }
+
+    TheCall->setType(ElTy);
+    break;
+  }
+
   case Builtin::BI__builtin_matrix_transpose:
     return BuiltinMatrixTranspose(TheCall, TheCallResult);
 

diff  --git a/clang/test/CodeGen/builtins-reduction-math.c 
b/clang/test/CodeGen/builtins-reduction-math.c
index e12fd729c84c0..187f42068905a 100644
--- a/clang/test/CodeGen/builtins-reduction-math.c
+++ b/clang/test/CodeGen/builtins-reduction-math.c
@@ -4,6 +4,8 @@
 // RUN: %clang_cc1 -O1 -triple aarch64 -target-feature +sve  %s -emit-llvm 
-disable-llvm-passes -o - | FileCheck --check-prefixes=SVE   %s
 
 typedef float float4 __attribute__((ext_vector_type(4)));
+typedef _Float16 half8 __attribute__((ext_vector_type(8)));
+
 typedef short int si8 __attribute__((ext_vector_type(8)));
 typedef unsigned int u4 __attribute__((ext_vector_type(4)));
 
@@ -162,6 +164,37 @@ void test_builtin_reduce_minimum(float4 vf1) {
   const double r4 = __builtin_reduce_minimum(vf1_as_one);
 }
 
+void test_builtin_reduce_addf(float4 vf1, half8 vf2, float start) {
+  // CHECK-LABEL: define void @test_builtin_reduce_addf(
+
+  // CHECK:      [[V0:%.+]] = load <4 x float>, ptr %vf1.addr, align 16
+  // CHECK-NEXT: call reassoc float @llvm.vector.reduce.fadd.v4f32(float 
1.000000e+00, <4 x float> [[V0]])
+  float r1 = __builtin_reduce_assoc_fadd(vf1, 1.0f);
+
+  // CHECK:      [[V1:%.+]] = load <4 x float>, ptr %vf1.addr, align 16
+  // CHECK-NEXT: call float @llvm.vector.reduce.fadd.v4f32(float 2.000000e+00, 
<4 x float> [[V1]])
+  float r2 = __builtin_reduce_in_order_fadd(vf1, 2.0f);
+
+  // CHECK:      [[V2:%.+]] = load <8 x half>, ptr %vf2.addr, align 16
+  // CHECK-NEXT: call reassoc half @llvm.vector.reduce.fadd.v8f16(half 
0xH8000, <8 x half> [[V2:%.+]])
+  _Float16 r3 = __builtin_reduce_assoc_fadd(vf2);
+
+  // CHECK:      [[V3:%.+]] = load <8 x half>, ptr %vf2.addr, align 16
+  // CHECK-NEXT: [[RDX:%.+]] = call half @llvm.vector.reduce.fadd.v8f16(half 
0xH8000, <8 x half> [[V3]])
+  // CHECK-NEXT: fpext half [[RDX]] to float
+  float r4 = __builtin_reduce_in_order_fadd(vf2, -0.0f);
+
+  // CHECK:      [[V4:%.+]] = load <4 x float>, ptr %vf1.addr, align 16
+  // CHECK:      [[START0:%.+]] = load float, ptr %start.addr, align 4
+  // CHECK-NEXT: call float @llvm.vector.reduce.fadd.v4f32(float [[START0]], 
<4 x float> [[V4]])
+  float r5 = __builtin_reduce_in_order_fadd(vf1, start);
+
+  // CHECK:      [[V5:%.+]] = load <8 x half>, ptr %vf2.addr, align 16
+  // CHECK:      [[START1:%.+]] = fptrunc float %{{.*}} to half
+  // CHECK-NEXT: call reassoc half @llvm.vector.reduce.fadd.v8f16(half 
[[START1]], <8 x half> [[V5:%.+]])
+  _Float16 r7 = __builtin_reduce_assoc_fadd(vf2, start);
+}
+
 #if defined(__ARM_FEATURE_SVE)
 #include <arm_sve.h>
 

diff  --git a/clang/test/Sema/builtins-reduction-math.c 
b/clang/test/Sema/builtins-reduction-math.c
index 74f09d501198b..5270de644356e 100644
--- a/clang/test/Sema/builtins-reduction-math.c
+++ b/clang/test/Sema/builtins-reduction-math.c
@@ -148,3 +148,23 @@ void test_builtin_reduce_minimum(int i, float4 v, int3 iv) 
{
   i = __builtin_reduce_minimum(i);
   // expected-error@-1 {{1st argument must be a vector of floating-point types 
(was 'int')}}
 }
+
+void test_builtin_reduce_addf(float f, float4 v, int3 iv) {
+  struct Foo s = __builtin_reduce_assoc_fadd(v);
+  // expected-error@-1 {{initializing 'struct Foo' with an expression of 
incompatible type 'float'}}
+
+  f = __builtin_reduce_in_order_fadd(v);
+  // expected-error@-1 {{too few arguments to function call, expected 2, have 
1}}
+
+  f = __builtin_reduce_in_order_fadd(v, f, f);
+  // expected-error@-1 {{too many arguments to function call, expected 2, have 
3}}
+
+  f = __builtin_reduce_assoc_fadd();
+  // expected-error@-1 {{too few arguments to function call, expected 1, have 
0}}
+
+  f = __builtin_reduce_assoc_fadd(iv);
+  // expected-error@-1 {{1st argument must be a vector of floating-point types 
(was 'int3' (vector of 3 'int' values))}}
+
+  f = __builtin_reduce_in_order_fadd(v, (int)121);
+  // expected-error@-1 {{2nd argument must be a scalar floating-point type 
(was 'int')}}
+}


        
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] ce952a2 - [Clang] Add `__builtin_reduce_[in_order|assoc]_fadd` for floating-point reductions (#176160)

Reply via email to