| Issue |
83402
|
| Summary |
[X86] v4f16 -> v4i32 conversion unnecessarily using YMM registers
|
| Labels |
backend:X86,
missed-optimization
|
| Assignees |
|
| Reporter |
RKSimon
|
https://rust.godbolt.org/z/PEfxzYY4z
```ll
define <4 x i32> @fptosi_4f16_to_4i32(<4 x half> %a) nounwind {
%cvt = fptosi <4 x half> %a to <4 x i32>
ret <4 x i32> %cvt
}
```
llc -mcpu=x86-64-v3
```asm
fptosi_4f16_to_4i32: # @fptosi_4f16_to_4i32
vcvtph2ps %xmm0, %ymm0
vcvttps2dq %ymm0, %ymm0
vzeroupper
retq
```
We should only require the xmm variants (and avoid the vzeroupper entirely)
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs