| Issue |
84182
|
| Summary |
AVX-512-VNNI instruction not generated when `target-cpu=znver4`
|
| Labels |
backend:X86,
llvm:codegen
|
| Assignees |
|
| Reporter |
bjacob
|
Filing this Issue with two Compiler Explorer testcases - one in LLVM IR and the other in C.
# LLVM IR testcase
Compiler Explorer link: https://godbolt.org/z/3Wf1cfEo1
Problem: when the parent function has `"target-cpu"="znver4"`, the `@llvm.x86.avx512.vpdpwssd.512` intrinsic fails to compile to the expected AVX-512-VNNI `vpdpwssd` instruction and instead generates a (`vpmaddwd`, `vpaddd`) fallback implementation.
The problem only reproduces with `"target-cpu"="znver4"` and not with `"target-cpu"="cascadelake"` or when `"target-cpu"` is simply omitted, as shown in the Compiler Explorer link.
Inlining the LLVM IR testcase here for completeness:
```llvm
; Testing with "target-cpu"="znver4" in attributes #0 below. Nothing else changes between testcases.
define dso_local <8 x i64> @foo(<8 x i64> noundef %0, <8 x i64> noundef %1, <8 x i64> noundef %2) local_unnamed_addr #0 {
%4 = bitcast <8 x i64> %0 to <16 x i32>
%5 = bitcast <8 x i64> %1 to <16 x i32>
%6 = bitcast <8 x i64> %2 to <16 x i32>
%7 = tail call <16 x i32> @llvm.x86.avx512.vpdpwssd.512(<16 x i32> %4, <16 x i32> %5, <16 x i32> %6)
%8 = bitcast <16 x i32> %7 to <8 x i64>
ret <8 x i64> %8
}
declare <16 x i32> @llvm.x86.avx512.vpdpwssd.512(<16 x i32>, <16 x i32>, <16 x i32>) #1
declare void @llvm.dbg.value(metadata, metadata, metadata) #2
attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable "min-legal-vector-width"="512" "no-trapping-math"="true" "stack-protector-buffer-size"="8"
"target-cpu"="znver4" "target-features"="+adx,+aes,+avx,+avx2,+avx512bf16,+avx512bitalg,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512ifma,+avx512vbmi,+avx512vbmi2,+avx512vl,+avx512vnni,+avx512vpopcntdq,+bmi,+bmi2,+clflushopt,+clwb,+clzero,+crc32,+cx16,+cx8,+evex512,+f16c,+fma,+fsgsbase,+fxsr,+gfni,+invpcid,+lzcnt,+mmx,+movbe,+mwaitx,+pclmul,+pku,+popcnt,+prfchw,+rdpid,+rdpru,+rdrnd,+rdseed,+sahf,+sha,+shstk,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+sse4a,+ssse3,+vaes,+vpclmulqdq,+wbnoinvd,+x87,+xsave,+xsavec,+xsaveopt,+xsaves" }
attributes #1 = { mustprogress nocallback nofree nosync nounwind willreturn memory(none) }
attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
```
# C testcase
Compiler Explorer link: https://godbolt.org/z/z6xxf35hG
Problem: when compiling with `-march=znver4`, the `_mm512_dpwssd_epi32` intrinsic fails to compile to the expected AVX-512-VNNI `vpdpwssd` instruction and instead generates a (`vpmaddwd`, `vpaddd`) fallback implementation.
The problem only reproduces with `-march=znver4` and not with `-march=cascade` or when `-march` is simply omitted and individual AVX-512 features are passed instead, as shown in the Compiler Explorer link.
Inlining the C testcase here for completeness:
```c
#include <immintrin.h>
__m512i foo(__m512i x, __m512i y, __m512i z) {
return _mm512_dpwssd_epi32(x, y, z);
}
```
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs