Issue 84182
Summary AVX-512-VNNI instruction not generated when `target-cpu=znver4`
Labels backend:X86, llvm:codegen
Assignees
Reporter bjacob
    Filing this Issue with two Compiler Explorer testcases - one in LLVM IR and the other in C.

# LLVM IR testcase

Compiler Explorer link: https://godbolt.org/z/3Wf1cfEo1

Problem: when the parent function has `"target-cpu"="znver4"`, the `@llvm.x86.avx512.vpdpwssd.512` intrinsic fails to compile to the expected AVX-512-VNNI `vpdpwssd` instruction and instead generates a (`vpmaddwd`, `vpaddd`) fallback implementation.

The problem only reproduces with `"target-cpu"="znver4"` and not with `"target-cpu"="cascadelake"` or when `"target-cpu"` is simply omitted, as shown in the Compiler Explorer link.

Inlining the LLVM IR testcase here for completeness:

```llvm
; Testing with "target-cpu"="znver4" in attributes #0 below. Nothing else changes between testcases.

define dso_local <8 x i64> @foo(<8 x i64> noundef %0, <8 x i64> noundef %1, <8 x i64> noundef %2) local_unnamed_addr #0 {
  %4 = bitcast <8 x i64> %0 to <16 x i32>
  %5 = bitcast <8 x i64> %1 to <16 x i32>
  %6 = bitcast <8 x i64> %2 to <16 x i32>
  %7 = tail call <16 x i32> @llvm.x86.avx512.vpdpwssd.512(<16 x i32> %4, <16 x i32> %5, <16 x i32> %6)
  %8 = bitcast <16 x i32> %7 to <8 x i64>
  ret <8 x i64> %8
}

declare <16 x i32> @llvm.x86.avx512.vpdpwssd.512(<16 x i32>, <16 x i32>, <16 x i32>) #1

declare void @llvm.dbg.value(metadata, metadata, metadata) #2

attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable "min-legal-vector-width"="512" "no-trapping-math"="true" "stack-protector-buffer-size"="8"
  "target-cpu"="znver4" "target-features"="+adx,+aes,+avx,+avx2,+avx512bf16,+avx512bitalg,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512ifma,+avx512vbmi,+avx512vbmi2,+avx512vl,+avx512vnni,+avx512vpopcntdq,+bmi,+bmi2,+clflushopt,+clwb,+clzero,+crc32,+cx16,+cx8,+evex512,+f16c,+fma,+fsgsbase,+fxsr,+gfni,+invpcid,+lzcnt,+mmx,+movbe,+mwaitx,+pclmul,+pku,+popcnt,+prfchw,+rdpid,+rdpru,+rdrnd,+rdseed,+sahf,+sha,+shstk,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+sse4a,+ssse3,+vaes,+vpclmulqdq,+wbnoinvd,+x87,+xsave,+xsavec,+xsaveopt,+xsaves" }
attributes #1 = { mustprogress nocallback nofree nosync nounwind willreturn memory(none) }
attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
```

# C testcase

Compiler Explorer link: https://godbolt.org/z/z6xxf35hG

Problem: when compiling with `-march=znver4`, the `_mm512_dpwssd_epi32` intrinsic fails to compile to the expected AVX-512-VNNI `vpdpwssd` instruction and instead generates a (`vpmaddwd`, `vpaddd`) fallback implementation.

The problem only reproduces with `-march=znver4` and not with `-march=cascade` or when `-march` is simply omitted and individual AVX-512 features are passed instead, as shown in the Compiler Explorer link.

Inlining the C testcase here for completeness:

```c
#include <immintrin.h>

__m512i foo(__m512i x, __m512i y, __m512i z) {
 return _mm512_dpwssd_epi32(x, y, z);
}
```

_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to