| Issue |
178632
|
| Summary |
[X86] Generation of slower vperm2f128 (?)
|
| Labels |
backend:X86
|
| Assignees |
|
| Reporter |
nikic
|
https://llvm.godbolt.org/z/49x1cP1Mf
Codegen here has changed from
```
test: # @test
vmovsd xmm0, qword ptr [esp + 28] # xmm0 = mem[0],zero
vmovsd xmm1, qword ptr [esp + 12] # xmm1 = mem[0],zero
vmovhps xmm0, xmm0, qword ptr [esp + 20] # xmm0 = xmm0[0,1],mem[0,1]
vmovhps xmm1, xmm1, qword ptr [esp + 4] # xmm1 = xmm1[0,1],mem[0,1]
vinsertf128 ymm0, ymm0, xmm1, 1
ret
```
to
```
test: # @test
vperm2f128 ymm0, ymm0, ymmword ptr [esp + 4], 35 # ymm0 = mem[2,3,0,1]
vshufpd ymm0, ymm0, ymm0, 5 # ymm0 = ymm0[1,0,3,2]
ret
```
llvm-mca seems to think that the new form is slower (for any `-mcpu` I tried).
I'm not sure whether the MCA estimate here is correct, so I wanted to check with someone who is more familiar with this...
(Using `target-cpu=znver3` or similar uses verpmd instead.)
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs