On 8/3/2022 1:52 AM, Richard Sandiford via Gcc-patches wrote:
Takayuki 'January June' Suwa via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps
data flow consistent, but it also increases register allocation pressure
and thus often creates many unwanted register-to-register moves that
cannot be optimized away.
There are two things here:

- If emit_move_complex_parts emits a clobber of a hard register,
   then that's probably a bug/misfeature.  The point of the clobber is
   to indicate that the register has no useful contents.  That's useful
   for wide pseudos that are written to in parts, since it avoids the
   need to track the liveness of each part of the pseudo individually.
   But it shouldn't be necessary for hard registers, since subregs of
   hard registers are simplified to hard registers wherever possible
   (which on most targets is "always").

   So I think the emit_move_complex_parts clobber should be restricted
   to !HARD_REGISTER_P, like the lower-subreg clobber is.  If that helps
   (if only partly) then it would be worth doing as its own patch.
Agreed.


- I think it'd be worth looking into more detail why a clobber makes
   a difference to register pressure.  A clobber of a pseudo register R
   shouldn't make R conflict with things that are live at the point of
   the clobber.
Also agreed.


  It seems just analogous to partial register
stall which is a famous problem on processors that do register renaming.

In my opinion, when the register to be clobbered is a composite of hard
ones, we should clobber the individual elements separetely, otherwise
clear the entire to zero prior to use as the "init-regs" pass does (like
partial register stall workarounds on x86 CPUs).  Such redundant zero
constant assignments will be removed later in the "cprop_hardreg" pass.
I don't think we should rely on the zero being optimised away later.

Emitting the zero also makes it harder for the register allocator
to elide the move.  For example, if we have:

   (set (subreg:SI (reg:DI P) 0) (reg:SI R0))
   (set (subreg:SI (reg:DI P) 4) (reg:SI R1))

then there is at least a chance that the RA could assign hard registers
R0:R1 to P, which would turn the moves into nops.  If we emit:

   (set (reg:DI P) (const_int 0))

beforehand then that becomes impossible, since R0 and R1 would then
conflict with P.

TBH I'm surprised we still run init_regs for LRA.  I thought there was
a plan to stop doing that, but perhaps I misremember.
I have vague memories of dealing with some of this nonsense a few release cycles.  I don't recall all the details, but init-regs + lower-subreg + regcprop + splitting all conspired to generate poor code on the MIPS targets.  See pr87761, though it doesn't include all my findings -- I can't recall if I walked through the entire tortured sequence in the gcc-patches discussion or not.

I ended up working around in the mips backend in conjunction with some changes to regcprop IIRC.

Jeff

Reply via email to