Issue |
137422
|
Summary |
[X86] Suboptimal code for AVX-512 narrowing
|
Labels |
new issue
|
Assignees |
|
Reporter |
dzaima
|
This code, compiled via `-O3 -march=znver4`:
```c
#include <immintrin.h>
#include <stdint.h>
void narrow_u32x16x4_to_u8x64(uint8_t* dst, __m512i x0, __m512i x1, __m512i x2, __m512i x3) {
__m512i inds = _mm512_set_epi8(
124, 120, 116, 112, 108, 104, 100, 96,
92, 88, 84, 80, 76, 72, 68, 64,
60, 56, 52, 48, 44, 40, 36, 32,
28, 24, 20, 16, 12, 8, 4, 0,
124, 120, 116, 112, 108, 104, 100, 96,
92, 88, 84, 80, 76, 72, 68, 64,
60, 56, 52, 48, 44, 40, 36, 32,
28, 24, 20, 16, 12, 8, 4, 0
);
__m512i x01 = _mm512_permutex2var_epi8(x0, inds, x1);
__m512i x23 = _mm512_permutex2var_epi8(x2, inds, x3);
__m512i x0123 = _mm512_mask_blend_epi64(0xF0, x01, x23);
_mm512_storeu_si512(dst, x0123);
}
```
produces:
```asm
narrow_u32x16x4_to_u8x64:
vmovdqa64 zmm4, zmmword ptr [rip + .LCPI0_0]
vmovdqa64 zmm5, zmmword ptr [rip + .LCPI0_1]
vpshufb zmm1, zmm1, zmm4
vpshufb zmm0, zmm0, zmm5
vpshufb zmm3, zmm3, zmm4
vpshufb zmm2, zmm2, zmm5
vporq zmm0, zmm0, zmm1
vporq zmm1, zmm2, zmm3
vpmovsxbd zmm3, xmmword ptr [rip + .LCPI0_3]
vpermi2d zmm3, zmm0, zmm1
vmovdqu64 zmmword ptr [rdi], zmm3
vzeroupper
ret
```
instead of the more direct version that gcc produces:
```asm
narrow_u32x16x4_to_u8x64:
vmovdqa64 zmm4, ZMMWORD PTR .LC0[rip]
kmovb k1, BYTE PTR .LC1[rip]
vpermt2b zmm0, zmm4, zmm1
vpermi2b zmm4, zmm2, zmm3
vmovdqa64 zmm0{k1}, zmm4
vmovdqu64 ZMMWORD PTR [rdi], zmm0
ret
```
The code implements a general 64-element `u32` to `u8` narrow, and should have 2x higher throughput than using `vpmovdb` as clang currently does via autovectorization on both Intel and AMD (and allows doing merging of multiple results via a blend instead of insert, which can run on more ports), so that's perhaps a separate thing that could be improved. I believe similar approaches should get a ~2x throughput boost for all narrowing conversions, on both Intel and AMD.
https://godbolt.org/z/ax7Yda7Ps
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs