https://github.com/easyonaadit updated https://github.com/llvm/llvm-project/pull/175132
>From 14d40610272bf26f3cf3c3b15b314e99e201c059 Mon Sep 17 00:00:00 2001 From: Aaditya <[email protected]> Date: Fri, 9 Jan 2026 12:05:04 +0530 Subject: [PATCH 1/2] [AMDGPU] Update documentation for wave reduction intrinsics --- llvm/docs/AMDGPUUsage.rst | 74 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 70 insertions(+), 4 deletions(-) diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst index 39280a37e8d30..c46018bdaa491 100644 --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -1378,9 +1378,19 @@ The AMDGPU backend implements the following LLVM IR intrinsics. 0: Target default preference, 1: `Iterative strategy`, and 2: `DPP`. - If target does not support the DPP operations (e.g. gfx6/7), + If the target does not support the DPP operations (e.g. gfx6/7), reduction will be performed using default iterative strategy. - Intrinsic is currently only implemented for i32. + Intrinsic is implemented for i32 and i64 types. + + llvm.amdgcn.wave.reduce.min Similar to `llvm.amdgcn.wave.reduce.umin`, but performs a signed min + reduction on signed integers. + Intrinsic is implemented for i32 and i64 types. + + llvm.amdgcn.wave.reduce.fmin Similar to `llvm.amdgcn.wave.reduce.umin`, but performs a floating point min + reduction on floating point values. + Intrinsic is implemented for float and double types. + NAN values are not canonnicalized. + The ordering behaviour of SNANs is non-deterministic. llvm.amdgcn.wave.reduce.umax Performs an arithmetic unsigned max reduction on the unsigned values provided by each lane in the wavefront. @@ -1388,9 +1398,65 @@ The AMDGPU backend implements the following LLVM IR intrinsics. 0: Target default preference, 1: `Iterative strategy`, and 2: `DPP`. - If target does not support the DPP operations (e.g. gfx6/7), + If the target does not support the DPP operations (e.g. gfx6/7), reduction will be performed using default iterative strategy. - Intrinsic is currently only implemented for i32. + Intrinsic is implemented for i32 and i64 types. + + llvm.amdgcn.wave.reduce.max Similar to `llvm.amdgcn.wave.reduce.umax`, but performs a signed max + reduction on signed integers. + Intrinsic is implemented for i32 and i64 types. + + llvm.amdgcn.wave.reduce.fmax Similar to `llvm.amdgcn.wave.reduce.umax`, but performs a floating point max + reduction on floating point values. + Intrinsic is implemented for float and double types. + NAN values are not canonnicalized. + The ordering behaviour of SNANs is non-deterministic. + + llvm.amdgcn.wave.reduce.add Performs an arithmetic add reduction on the signed/unsigned values + provided by each lane in the wavefront. + Intrinsic takes a hint for reduction strategy using second operand + 0: Target default preference, + 1: `Iterative strategy`, and + 2: `DPP`. + If the target does not support the DPP operations (e.g. gfx6/7), + reduction will be performed using default iterative strategy. + Intrinsic is implemented for signed/unsigned i32 and i64 types. + + llvm.amdgcn.wave.reduce.fadd Similar to `llvm.amdgcn.wave.reduce.add`, but performs a floating point add + reduction on floating point values. + Intrinsic is implemented for float and double types. + + llvm.amdgcn.wave.reduce.sub Performs an arithmetic sub reduction on the signed/unsigned values + provided by each lane in the wavefront. + Intrinsic takes a hint for reduction strategy using second operand + 0: Target default preference, + 1: `Iterative strategy`, and + 2: `DPP`. + If the target does not support the DPP operations (e.g. gfx6/7), + reduction will be performed using default iterative strategy. + Intrinsic is implemented for signed/unsigned i32 and i64 types. + + llvm.amdgcn.wave.reduce.fsub Similar to `llvm.amdgcn.wave.reduce.sub`, but performs a floating point sub + reduction on floating point values. + Intrinsic is implemented for float and double types. + + llvm.amdgcn.wave.reduce.and Performs a bitwise-and reduction on the values + provided by each lane in the wavefront. + Intrinsic takes a hint for reduction strategy using second operand + 0: Target default preference, + 1: `Iterative strategy`, and + 2: `DPP`. + If the target does not support the DPP operations (e.g. gfx6/7), + reduction will be performed using default iterative strategy. + Intrinsic is implemented for i32 and i64 types. + + llvm.amdgcn.wave.reduce.or Similar to `llvm.amdgcn.wave.reduce.and`, but performs a bitwise-or + reduction on the values provided by each wavefront. + Intrinsic is implemented for i32 and i64 types. + + llvm.amdgcn.wave.reduce.xor Similar to `llvm.amdgcn.wave.reduce.and`, but performs a bitwise-xor + reduction on the values provided by each wavefront. + Intrinsic is implemented for i32 and i64 types. llvm.amdgcn.permlane16 Provides direct access to v_permlane16_b32. Performs arbitrary gather-style operation within a row (16 contiguous lanes) of the second input operand. >From 094c65eb44f952ae45e5290b3b7ae2e9891a6af9 Mon Sep 17 00:00:00 2001 From: Aaditya <[email protected]> Date: Wed, 28 Jan 2026 12:05:12 +0530 Subject: [PATCH 2/2] Modelled fmin/fmax similar to llvm.minimumnum/maximumnum --- llvm/docs/AMDGPUUsage.rst | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst index c46018bdaa491..4564a433a97e2 100644 --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -1389,7 +1389,10 @@ The AMDGPU backend implements the following LLVM IR intrinsics. llvm.amdgcn.wave.reduce.fmin Similar to `llvm.amdgcn.wave.reduce.umin`, but performs a floating point min reduction on floating point values. Intrinsic is implemented for float and double types. - NAN values are not canonnicalized. + Intrinsic is modelled similar to `llvm.minimumnum` intrinsic. + For reduction between two NAN values, a NAN is returned. + For reduction between a NAN and a number, the number is returned. + -0.0 < +0.0 is true for this reduction. The ordering behaviour of SNANs is non-deterministic. llvm.amdgcn.wave.reduce.umax Performs an arithmetic unsigned max reduction on the unsigned values @@ -1409,7 +1412,10 @@ The AMDGPU backend implements the following LLVM IR intrinsics. llvm.amdgcn.wave.reduce.fmax Similar to `llvm.amdgcn.wave.reduce.umax`, but performs a floating point max reduction on floating point values. Intrinsic is implemented for float and double types. - NAN values are not canonnicalized. + Intrinsic is modelled similar to `llvm.maximumnum` intrinsic. + For reduction between two NAN values, a NAN is returned. + For reduction between a NAN and a number, the number is returned. + -0.0 < +0.0 is true for this reduction. The ordering behaviour of SNANs is non-deterministic. llvm.amdgcn.wave.reduce.add Performs an arithmetic add reduction on the signed/unsigned values _______________________________________________ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
