| Issue |
56425
|
| Summary |
Missed optimization: Redundant copy when passing a pointer to a by-value struct arg
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
hhusjr
|
See [https://godbolt.org/z/vo1Kvd6MT](https://godbolt.org/z/vo1Kvd6MT) for a self-contained example.
For the following C code,
```c
struct sa {
char buffer[24];
};
void process(const struct sa *data);
static inline void func_inline(struct sa sa) {
process(&sa);
}
void call_inline(struct sa sa) {
func_inline(sa);
}
```
It seems that in the optimal assembly output of `call_inline` function, the address of the `sa` argument of it can be directly passed to `process` function. However, the compiler emits assembly like:
```asm
call_inline: # @call_inline
sub rsp, 24
mov rax, qword ptr [rsp + 48]
mov qword ptr [rsp + 16], rax
movaps xmm0, xmmword ptr [rsp + 32]
movaps xmmword ptr [rsp], xmm0
mov rdi, rsp
call process
add rsp, 24
ret
```
In the output, the whole sa struct is copied from 32(%rsp) to (%rsp). A slight change<sup>[1]</sup> can demonstrate that it's a missed optimization. If we add `__attribute__((noinline))` to `func`, `call` will tail-call with just `jmp func`, not copying any args there. **If it's legal to eliminate the arg copy when doing tail-call, it would be also legal to do that when inlining `func`.**<sup>[1]</sup>
```c
__attribute__((noinline)) static void func_noinline(struct sa sa) {
process(&sa);
}
void call_noinline(struct sa sa) {
func_noinline(sa);
}
```
```asm
call_noinline: # @call_noinline
jmp func_noinline # TAILCALL
func_noinline: # @func_noinline
push rax
lea rdi, [rsp + 16]
call process
pop rax
ret
```
[1] https://stackoverflow.com/questions/72859532/why-do-clang-and-gcc-produce-this-sub-optimal-output-copying-a-struct-for-pass
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs