[llvm-bugs] [Bug 178632] [X86] Generation of slower vperm2f128 (?)

LLVM Bugs via llvm-bugs Thu, 29 Jan 2026 03:06:37 -0800

Issue	178632
Summary	[X86] Generation of slower vperm2f128 (?)
Labels	backend:X86
Assignees
Reporter	nikic

    https://llvm.godbolt.org/z/49x1cP1Mf

Codegen here has changed from
```
test:                                   # @test
 vmovsd  xmm0, qword ptr [esp + 28]      # xmm0 = mem[0],zero
 vmovsd  xmm1, qword ptr [esp + 12]      # xmm1 = mem[0],zero
        vmovhps xmm0, xmm0, qword ptr [esp + 20] # xmm0 = xmm0[0,1],mem[0,1]
        vmovhps xmm1, xmm1, qword ptr [esp + 4] # xmm1 = xmm1[0,1],mem[0,1]
 vinsertf128     ymm0, ymm0, xmm1, 1
        ret
```
to
```
test: # @test
        vperm2f128      ymm0, ymm0, ymmword ptr [esp + 4], 35 # ymm0 = mem[2,3,0,1]
        vshufpd ymm0, ymm0, ymm0, 5             # ymm0 = ymm0[1,0,3,2]
        ret
```


llvm-mca seems to think that the new form is slower (for any `-mcpu` I tried).

I'm not sure whether the MCA estimate here is correct, so I wanted to check with someone who is more familiar with this...

(Using `target-cpu=znver3` or similar uses verpmd instead.)

_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 178632] [X86] Generation of slower vperm2f128 (?)

Reply via email to