Re: [PATCH v6] Implement new RTL optimizations pass: fold-mem-offsets.

Manolis Tsamis Tue, 10 Oct 2023 05:00:39 -0700

On Fri, Oct 6, 2023 at 8:57 PM Jeff Law <jeffreya...@gmail.com> wrote:
>
>
>
> On 10/6/23 08:17, Manolis Tsamis wrote:
> SNIP
> >> So I was ready to ACK, but realized there weren't any testresults for a
> >> primary platform mentioned.  So I ran this on x86.
> >>
> >> It's triggering one regression (code quality).
> >>
> >> Specifically gcc.target/i386/pr52146.c
> >>
> >> The f-m-o code is slightly worse than without f-m-o.
> >>
> >> Without f-m-o we get this:
> >>
> >>      9 0000 B88000E0              movl    $-18874240, %eax
> >>      9      FE
> >>     10 0005 67C70000              movl    $0, (%eax)
> >>     10      000000
> >>     11 000c C3                    ret
> >>
> >> With f-m-o we get this:
> >>
> >>      9 0000 B8000000              movl    $0, %eax
> >>      9      00
> >>     10 0005 67C78080              movl    $0, -18874240(%eax)
> >>     10      00E0FE00
> >>     10      000000
> >>     11 0010 C3                    ret
> >>
> >>
> >> The key being that we don't get rid of the original move instruction,
> >> nor does the original move instruction get smaller due to simplification
> >> of its constant.  Additionally, the memory store gets larger.  The net
> >> is a 4 byte increase in code size.
> >>
> > Yes, this case is not good for f-m-o. In theory there could be a cost
> > calculation step that tries to estimate the benefit of a
> > transformation, but given that f-m-o cannot transform code in a way
> > that we have big regressions it's unclear to me whether complicating
> > the code is worth it. At least if we can solve the issues in other
> > ways (also see discussion below).
> >
> >>
> >> This is probably a fairly rare scenario and the original bug report was
> >> for a correctness issue in using addresses in the range
> >> 0x80000000..0xffffffff in x32.  So I wouldn't lose any sleep if we
> >> adjusted the test to pass -fno-fold-mem-offsets.  But before doing that
> >> I wanted to give you the chance to ponder if this is something you'd
> >> prefer to improve in f-m-o itself.   At some level if the base register
> >> collapses down to 0, then we could take the offset as a constant address
> >> and try to recognize that form.  If that fails, then just consider the
> >> change unprofitable rather than trying to recognize it as reg+d.
> >>
> >> Anyway, waiting to hear your thoughts...
> >>
> > Yes, this testcase has been bugging me too, I have brought that up in
> > previous iterations as well.
> I must have missed that in the earlier discussion.
>
> > I'm also not sure whether this is a code quality or a correctness
> > issue? From what I understand from the relevant ticket, if we emit
> > movl $0, -18874240 then it's wrong code?
> It's a code quality issue as long as we don't transform the code into
> movl $0, -18874240, at which point it would become a correctness issue.
>
Ok, thanks for pointing that out as I thought that movl $0, -18874240
and movl $0, -18874240(eax) with eax == 0 were the same.


>
> >
> > With regards to the "recognize that the base register is 0", that
> > would be nice but how would we recognise that? f-m-o can only
> > calculate the folded offset but that is not enough to prove that the
> > base register is zero or not.
> It's a chain of insns that produce an address and use it in the memory
> reference.  We essentially changed the first insn in the chain from movl
> -18874240, %eax into movl 0, %eax.  So we'd have to someone note that
> the base register in the memory reference has the value zero in the
> chain of instructions.  That may or may not be reasonable to do.
>
Ok, do you believe it would be worthwhile to add a REG_EQ note or
something similar? I assume some REG_EQ notes will be added from the
following cprop-hardreg pass too?

> >
> > One thought that I've had is that this started being an issue on x86
> > when I enabled folding of mv REG, INT in addition to the existing ADD
> > REG, REG, INT. The idea was that a move will be folded to mv REG, 0
> > and on targets that we have a zero register that can be beneficial for
> > a number of reasons... but on x86 we don't have a zero register so the
> > benefit is much more limited anyway. So maybe we could disable folding
> > of moves on targets that don't have a zero register? That would solve
> > the issue and I believe it also makes sense in general. If so, is
> > there a way to query wether the target has such register?
> We don't have a generalized query to do that.  You might be able to ask
> what the cost to load 0 into a register is, but many targets
> artificially decrease that value.
>
Ok, I see. Then let's not use that approach.

> You could use the costing model to cost the entire sequence
> before/after.  There's an interface to walk a sequence and return a
> cost.  In the case of f-m-o the insns are part of the larger chain, so
> we'd need a different API.
>
> The other option would be to declare this is known, but not important
> issue.  I would think that it's going to be rare to have absolute
> addresses and x32 isn't really used much.  The combination of the two
> should be exceedingly rare.  Hence my willingness to use
> -fno-fold-mem-offsets in the test.
>
Yes, I now better understand why you suggested adding
-fno-fold-mem-offsets to the testcase. Or we could also make the test
not fail in this case where the memory access has the base register,
as there's no correctness issue.

Then, going back to the code quality regression, wouldn't things be
much better if we would emit xor eax, eax instead of mov eax, 0? In
that case xor eax, eax should be 2 bytes instead of 5 for mov eax, 0
and hence the code size difference should be trivial. Then we can keep
f-m-o as is, including this testcase.

Is there a way to emit a taret-specific optimized register zero insn?
If so I'll use that and adjust the testcase as needed, and I think
we're done with this one.

Thanks,
Manolis

> Jeff

Re: [PATCH v6] Implement new RTL optimizations pass: fold-mem-offsets.

Reply via email to