[clang] [llvm] [AMDGPU] Add dot product patterns with saturating add (clamp) (PR #187945)

via cfe-commits Wed, 01 Apr 2026 11:01:47 -0700

================
@@ -731,13 +731,49 @@ defm V_DOT4_F32_BF8_BF8 : 
VOP3PDOTF8Inst<"v_dot4_f32_bf8_bf8", int_amdgcn_dot4_f
 def : UDot2Pat<V_DOT2_U32_U16>;
 def : SDot2Pat<V_DOT2_I32_I16>;
 
+// Saturating unsigned dot2 pattern: uaddsat(a[0]*b[0] + a[1]*b[1], c)
+class UDot2SatPat<VOP_Pseudo Inst> : GCNPat <
----------------
addmisol wrote:


Yes, these patterns are still reachable. They match a different input form than 
performSatAddCombine:

  - performSatAddCombine: Matches uaddsat(INTRINSIC_WO_CHAIN(amdgcn_udot2, ..., 
0), accum) — patterns where the dot intrinsic wass already formed at IR level 
from <2 x i16> vectors.
  - UDot2SatPat/SDot2SatPat: Match scalar decomposed patterns like 
uaddsat(add(mul_u24(srl $src0, 16), ..), ..) — these come from scalar i32 code 
with packed i16 values extracted via shifts/masks, which bypasses the IR-level 
intrinsic formation.

  Added scalar_udot2_sat and scalar_sdot2_sat tests to verify these patterns 
are exercised..

https://github.com/llvm/llvm-project/pull/187945
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Add dot product patterns with saturating add (clamp) (PR #187945)

Reply via email to