| Issue |
114959
|
| Summary |
Missing optimization in zip_float
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
junaire
|
Source:
```c
#include <immintrin.h>
void zip_float(const double *src, double *dst) {
__m256d s0 = _mm256_broadcast_pd((__m128d*)src);
__m256d s1 = _mm256_broadcast_pd((__m128d*)src + 2);
__m256d s = _mm256_shuffle_pd(s0, s1, 0xc);
s = _mm256_mul_pd(s, s);
_mm256_store_pd(dst, s);
}
```
LLVM:
```
zip_float:
vmovupd xmm0, xmmword ptr [rdi]
vmovupd xmm1, xmmword ptr [rdi + 32]
vunpcklpd xmm2, xmm0, xmm1
vunpckhpd xmm0, xmm0, xmm1
vinsertf128 ymm0, ymm2, xmm0, 1
vmulpd ymm0, ymm0, ymm0
vmovapd ymmword ptr [rsi], ymm0
vzeroupper
ret
```
GCC:
```
zip_float:
vbroadcastf128 ymm0, XMMWORD PTR [rdi]
vbroadcastf128 ymm1, XMMWORD PTR [rdi+32]
vshufpd ymm0, ymm0, ymm1, 12
vmulpd ymm0, ymm0, ymm0
vmovapd YMMWORD PTR [rsi], ymm0
vzeroupper
ret
```
Godbolt: https://godbolt.org/z/ffz1YEhPE
Tweeted by FFmpeg: https://x.com/FFmpeg/status/1853326818008514900
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs