https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107167
Bug ID: 107167
Summary: It looks like GCC wastes registers on trivial
computations when result can be cached
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: unlvsur at live dot com
Target Milestone: ---
I do not know whether it is a big issue or not with targets that provide tons
of available registers (like aarch64 or loongarch64). However, this looks like
a big issue for x86_64 which only provides 16 general purpose registers (plus
%rsp is reserved, so 15 available registers)
Take the example like this:
https://godbolt.org/z/77rEsr1PG
#include<bit>
unsigned Sigma1(unsigned x) noexcept
{
return std::rotr(x,6)^std::rotr(x,11)^std::rotr(x,25);
}
GCC generates code like this to avoid dependencies.
Sigma1m(unsigned int):
movl %edi, %eax
movl %edi, %edx
roll $7, %edi
rorl $6, %eax
rorl $11, %edx
xorl %edx, %eax
xorl %edi, %eax
ret
However:
mySigma1m(unsigned int):
movl %edi, %eax
rorl $6, %edi
rorl $11, %eax
xorl %edi, %eax
rorl $19, %edi
xorl %edi, %eax
ret
Saves one register in this task. That becomes a huge problem when tons of
computation are involved where registers are in a position of shortage.
1st one also generates 1 more instruction and it can affect the code cache.
Aggressively utilizing all registers may not give the best results. Local
maximum =/= Global maximum.
I don't know.