RE: [PATCH] Improved RTL expansion of field assignments into promoted registers.

Roger Sayle Thu, 28 Dec 2023 11:35:24 -0800


Hi Jeff,
Thanks for the speedy review.

> On 12/28/23 07:59, Roger Sayle wrote:
> > This patch fixes PR rtl-optmization/104914 by tweaking/improving the
> > way that fields are written into a pseudo register that needs to be
> > kept sign extended.
> Well, I think "fixes" is a bit of a stretch.  We're avoiding the issue by 
> changing the
> early RTL generation, but if I understand what's going on in the RTL 
> optimizers
> and MIPS backend correctly, the core bug still remains.  Admittedly I haven't 
> put it
> under a debugger, but that MIPS definition of NOOP_TRUNCATION just seems
> badly wrong and is just waiting to pop it's ugly head up again.

I think this really is the/a correct fix. The MIPS backend defines 
NOOP_TRUNCATION
to false, so it's not correct to use a SUBREG to convert from DImode to SImode.
The problem then is where in the compiler (middle-end or backend) is this 
invalid
SUBREG being created and how can it be fixed.  In this particular case, the 
fault
is in RTL expansion.  There may be other places where a SUBREG is 
inappropriately
used instead of a TRUNCATE, but this is the place where things go wrong for
PR rtl-optimization/104914.

Once an inappropriate SImode SUBREG is in the RTL stream, it can remain
harmlessly latent (most of the time), unless it gets split, simplified or 
spilled.
Copying this SImode expression into it's own pseudo, results in incorrect code.
One approach might be to use an UNSPEC for places where backend
invariants are temporarily invalid, but in this case it's machine independent
middle-end code that's using SUBREGs as though the target was an x86/pdp11.

So I agree that on the surface, both of these appear to be identical:
> (set (reg:DI) (sign_extend:DI (truncate:SI (reg:DI))))
> (set (reg:DI) (sign_extend:DI (subreg:SI (reg:DI))))

But should they get split or spilled by reload:

(set (reg_tmp:SI) (subreg:SI (reg:DI))
(set (reg:DI) (sign_extend:DI (reg_tmp:SI))

is invalid as the reg_tmp isn't correctly sign-extended for SImode.
But,

(set (reg_tmp:SI) (truncate:SI (reg:DI))
(set (reg:DI) (sign_extend:DI (reg_tmp:SI))

is fine.  The difference is the instant in time, when the SUBREG's invariants
aren't yet valid (and its contents shouldn't be thought of as SImode).

On nvptx, where truly_noop_truncation is always "false", it'd show the same
bug/failure, if it were not for that fact that nvptx doesn't attempt to store
values in "mode extended" (SUBREG_PROMOTED_VAR_P) registers.
The bug is really in MODE_REP_EXTENDED support.

> > The motivating example from the bugzilla PR is:
> >
> > extern void ext(int);
> > void foo(const unsigned char *buf) {
> >    int val;
> >    ((unsigned char*)&val)[0] = *buf++;
> >    ((unsigned char*)&val)[1] = *buf++;
> >    ((unsigned char*)&val)[2] = *buf++;
> >    ((unsigned char*)&val)[3] = *buf++;
> >    if(val > 0)
> >      ext(1);
> >    else
> >      ext(0);
> > }
> >
> > which at the end of the tree optimization passes looks like:
> >
> > void foo (const unsigned char * buf)
> > {
> >    int val;
> >    unsigned char _1;
> >    unsigned char _2;
> >    unsigned char _3;
> >    unsigned char _4;
> >    int val.5_5;
> >
> >    <bb 2> [local count: 1073741824]:
> >    _1 = *buf_7(D);
> >    MEM[(unsigned char *)&val] = _1;
> >    _2 = MEM[(const unsigned char *)buf_7(D) + 1B];
> >    MEM[(unsigned char *)&val + 1B] = _2;
> >    _3 = MEM[(const unsigned char *)buf_7(D) + 2B];
> >    MEM[(unsigned char *)&val + 2B] = _3;
> >    _4 = MEM[(const unsigned char *)buf_7(D) + 3B];
> >    MEM[(unsigned char *)&val + 3B] = _4;
> >    val.5_5 = val;
> >    if (val.5_5 > 0)
> >      goto <bb 3>; [59.00%]
> >    else
> >      goto <bb 4>; [41.00%]
> >
> >    <bb 3> [local count: 633507681]:
> >    ext (1);
> >    goto <bb 5>; [100.00%]
> >
> >    <bb 4> [local count: 440234144]:
> >    ext (0);
> >
> >    <bb 5> [local count: 1073741824]:
> >    val ={v} {CLOBBER(eol)};
> >    return;
> >
> > }
> >
> > Here four bytes are being sequentially written into the SImode value
> > val.  On some platforms, such as MIPS64, this SImode value is kept in
> > a 64-bit register, suitably sign-extended.  The function
> > expand_assignment contains logic to handle this via
> > SUBREG_PROMOTED_VAR_P (around line 6264 in expr.cc) which outputs an
> > explicit extension operation after each store_field (typically insv) to such
> promoted/extended pseudos.
> >
> > The first observation is that there's no need to perform sign
> > extension after each byte in the example above; the extension is only
> > required after changes to the most significant byte (i.e. to a field
> > that overlaps the most significant bit).
> True.
> 
> > The bug fix is actually a bit more subtle, but at this point during
> > code expansion it's not safe to use a SUBREG when sign-extending this
> > field.  Currently, GCC generates (sign_extend:DI (subreg:SI (reg:DI)
> > 0)) but combine (and other RTL optimizers) later realize that because
> > SImode values are always sign-extended in their 64-bit hard registers
> > that this is a no-op and eliminates it.  The trouble is that it's
> > unsafe to refer to the SImode lowpart of a 64-bit register using
> > SUBREG at those critical points when temporarily the value isn't
> > correctly sign-extended, and the usual backend invariants don't hold.
> > At these critical points, the middle-end needs to use an explicit
> > TRUNCATE rtx (as this isn't a TRULY_NOOP_TRUNCATION), so that the
> > explicit sign-extension looks like (sign_extend:DI (truncate:SI (reg:DI)), 
> > which
> avoids the problem.
> 
> 
> > Note that MODE_REP_EXTENDED (NARROW, WIDE) != UNKOWN implies (or
> > should
> > imply) !TRULY_NOOP_TRUNCATION (NARROW, WIDE).  I've another
> > (independent) patch that I'll post in a few minutes.
> >
> >
> > This middle-end patch has been tested on x86_64-pc-linux-gnu with make
> > bootstrap and make -k check, both with and without
> > --target_board=unix{-m32} with no new failures.  The cc1 from a
> > cross-compiler to mips64 appears to generate much better code for the
> > above test case.  Ok for mainline?
> >
> >
> > 2023-12-28  Roger Sayle  <ro...@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >          PR rtl-optimization/104914
> >          * expr.cc (expand_assignment): When target is
> SUBREG_PROMOTED_VAR_P
> >          a sign or zero extension is only required if the modified field
> >          overlaps the SUBREG's most significant bit.  On MODE_REP_EXTENDED
> >          targets, don't refer to the temporarily incorrectly extended value
> >          using a SUBREG, but instead generate an explicit TRUNCATE rtx.
> [ ... ]
> 
> 
> > +         /* Check if the field overlaps the MSB, requiring extension.  */
> > +         else if (known_eq (bitpos + bitsize,
> > +                            GET_MODE_BITSIZE (GET_MODE (to_rtx))))
> Do you need to look at the size of the field as well?  ie, the starting 
> position might
> be before the sign bit, but the width of the field might cover the mode's 
> sign bit?
> 
> I'm not real good in the RTL expansion code, so if I'm offbase on this, just 
> let me
> know.

There are two things that help here.  The first is that the most significant
bit never appears in the middle of a field, so we don't have to worry about
overlapping, nor writes to the paradoxical bits of the SUBREG.  And secondly,
bits are numbered from zero for least significant, to MODE_BITSIZE (mode) - 1
for most significant, irrespective of the endian-ness.  So the code only needs
to check the highest value bitpos + bitsize is the maximum value for the mode.
The above logic stays the same, but which byte insert requires extension will
change between mips64be and mips64le.  i.e. we test that the most significant
bit of the field/byte being written in the most significant bit of the SUBREG
target. [That's my understanding/rationalization, I could wrong].

One thing I could be more cautious about is using maybe_eq instead of
known_eq, but the rest of the code (including truly_noop_truncation) assumes
scalar integer modes, so variable length vectors aren't (yet) a concern.
Would using maybe_eq be better coding style?

Cheers,
Roger
--

RE: [PATCH] Improved RTL expansion of field assignments into promoted registers.

Reply via email to