On Tuesday, 27 March 2018 at 20:38:35 UTC, H. S. Teoh wrote:
On Tue, Mar 27, 2018 at 08:25:36PM +0000, Rubn via
Digitalmars-d wrote: [...]
_D7example__T3fooTSQr3FooZQnFNbNiNfQrZv:
push rbp
mov rbp, rsp
sub rsp, 3104
lea rax, [rbp + 16]
lea rdi, [rbp - 2048]
lea rcx, [rbp - 1024]
mov edx, 1024
mov rsi, rcx
mov qword ptr [rbp - 2056], rdi
mov rdi, rsi
mov rsi, rax
mov qword ptr [rbp - 2064], rcx
call memcpy@PLT <--------------------- hidden copy
[...]
Is this generated by dmd, or gdc/ldc?
Generally, when it comes to performance issues, I don't even
bother looking at dmd-generated code anymore. If the extra
copying is still happening with gdc -O2 / ldc -O, then you have
a point. Otherwise, it doesn't really say very much.
T
It happens with LDC too, not sure how it would be able to know to
do any kind of optimization like that unless it was able to
inline every single function called into one function and be able
to do optimize it from there. I don't imagine that'll be likely
though.