Issue |
142321
|
Summary |
Missed optimization: passing vectors with different (compatible) target features prevents inlining
|
Labels |
new issue
|
Assignees |
|
Reporter |
tgross35
|
This source:
```llvm
target triple = "x86_64-unknown-linux-gnu"
define <2 x i64> @vec_entrypoint(<2 x i64> %a, <2 x i64> %b, <2 x i64> %keys) #4 {
%r = call <2 x i64> @vec_callee(<2 x i64> %a, <2 x i64> %b, <2 x i64> %keys)
ret <2 x i64> %r
}
define internal <2 x i64> @vec_callee(<2 x i64> %a, <2 x i64> %b, <2 x i64> %keys) #0 {
%t1 = call <2 x i64> @simd_wrap_pclmulqdq(<2 x i64> %a, <2 x i64> %keys)
ret <2 x i64> %t1
}
define internal <2 x i64> @simd_wrap_pclmulqdq(<2 x i64> %a, <2 x i64> %b) #3 {
%r = call <2 x i64> @llvm.x86.pclmulqdq(<2 x i64> %a, <2 x i64> %b, i8 0)
ret <2 x i64> %r
}
attributes #0 = { "target-cpu"="x86-64" }
attributes #3 = { "target-cpu"="x86-64" "target-features"="+pclmul,+sse,+sse2" }
attributes #4 = { "target-cpu"="x86-64" "target-features"="+pclmul,+sse,+sse2,+sse3" }
```
Emits the following suboptimal code:
```asm
vec_entrypoint: # @vec_entrypoint
movaps xmm1, xmm2
jmp vec_callee # TAILCALL
vec_callee: # @vec_callee
jmp simd_wrap_pclmulqdq # TAILCALL
simd_wrap_pclmulqdq: # @simd_wrap_pclmulqdq
pclmulqdq xmm0, xmm1, 0
ret
```
The middle function `vec_callee` doesn't have any target features enabled, and that seems to break inlining. Inlining should remain possible, however, since there can be no ABI change across the functions.
Using pointers rather than a vector return type generates the expected code:
```asm
ptr_entrypoint: # @ptr_entrypoint
movdqa xmm0, xmmword ptr [rsi]
pclmulqdq xmm0, xmmword ptr [rcx], 0
movdqa xmmword ptr [rdi], xmm0
ret
```
Repro: https://llvm.godbolt.org/z/54bEjqPjj
This issue was discovered at https://github.com/rust-lang/rust/issues/139029
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs