[Bug target/70557] uint64_t zeroing on 32-bit hardware

acahalan at gmail dot com Wed, 06 Apr 2016 05:20:12 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70557


--- Comment #4 from Albert Cahalan <acahalan at gmail dot com> ---
Mostly it's more like PR58741 because of the long long issue.

PR22141 (and PR23684 which is a better match) is about merging small things.
Two of the six examples here show that problem, those being the ones with a
loop over char.

The problem that prompted this bug report and determined the bug title is
different. It's in some way the opposite. When I ask gcc to store a 64-bit zero
value, gcc makes a 64-bit zero value in memory (two identical 32-bit halves in
a pair of 32-bit registers) and then stores that to memory.

There are many ways that this is wrong, and I worry that fixing one problem may
hide the other problems. Depending on compiler internals that I don't
understand, this could perhaps be four bugs:

1. When the two halves of a 64-bit value are identical, there is no need to
load values into two different registers. This is true for many constant
values, though obviously -1 and 0 would be most popular. Other popular values
would be the constants for computing a Hamming weight. AFAIK, this optimization
should apply whenever dealing with values that are larger than registers, such
as 128-bit values on 64-bit platforms.

2. When the address is to be encoded in the instruction that writes to memory,
it is best to directly clear the memory without first generating the constant
in registers. AFAIK, this optimization should apply to most CISC machines. The
fact that there is a special instruction for storing a 0 makes the optimization
more important.

3. When the address is to be encoded in an instruction, sometimes it is best to
place the address in a register and then use that register to supply the
address for storing to memory. This tends to apply when doing lots of writes,
when an address register happens to be available, and when optimizing for size.
AFAIK this optimization applies to most machines.

4. When using an address register to supply the location for storing, often it
is best to use autoincrement addressing instead of distinct offsets. This
usually generates smaller code. AFAIK this applies to many machines, including
at least: arm, m68k, and ppc.

(and also the store-merge issue, which makes 5)

[Bug target/70557] uint64_t zeroing on 32-bit hardware

Reply via email to