Jeff Law <jeffreya...@gmail.com> 于2023年12月29日周五 02:23写道:
>
>
>
> On 12/28/23 07:59, Roger Sayle wrote:
> >
> > This patch fixes PR rtl-optmization/104914 by tweaking/improving the way
> > that fields are written into a pseudo register that needs to be kept sign
> > extended.
> Well, I think "fixes" is a bit of a stretch.  We're avoiding the issue
> by changing the early RTL generation, but if I understand what's going
> on in the RTL optimizers and MIPS backend correctly, the core bug still
> remains.  Admittedly I haven't put it under a debugger, but that MIPS
> definition of NOOP_TRUNCATION just seems badly wrong and is just waiting
> to pop it's ugly head up again.
>

Yes. I am trying to get rid of it from MIPS64.
It may reduce our maintain workload.

>
>
> >
> > The motivating example from the bugzilla PR is:
> >
> > extern void ext(int);
> > void foo(const unsigned char *buf) {
> >    int val;
> >    ((unsigned char*)&val)[0] = *buf++;
> >    ((unsigned char*)&val)[1] = *buf++;
> >    ((unsigned char*)&val)[2] = *buf++;
> >    ((unsigned char*)&val)[3] = *buf++;
> >    if(val > 0)
> >      ext(1);
> >    else
> >      ext(0);
> > }
> >
> > which at the end of the tree optimization passes looks like:
> >
> > void foo (const unsigned char * buf)
> > {
> >    int val;
> >    unsigned char _1;
> >    unsigned char _2;
> >    unsigned char _3;
> >    unsigned char _4;
> >    int val.5_5;
> >
> >    <bb 2> [local count: 1073741824]:
> >    _1 = *buf_7(D);
> >    MEM[(unsigned char *)&val] = _1;
> >    _2 = MEM[(const unsigned char *)buf_7(D) + 1B];
> >    MEM[(unsigned char *)&val + 1B] = _2;
> >    _3 = MEM[(const unsigned char *)buf_7(D) + 2B];
> >    MEM[(unsigned char *)&val + 2B] = _3;
> >    _4 = MEM[(const unsigned char *)buf_7(D) + 3B];
> >    MEM[(unsigned char *)&val + 3B] = _4;
> >    val.5_5 = val;
> >    if (val.5_5 > 0)
> >      goto <bb 3>; [59.00%]
> >    else
> >      goto <bb 4>; [41.00%]
> >
> >    <bb 3> [local count: 633507681]:
> >    ext (1);
> >    goto <bb 5>; [100.00%]
> >
> >    <bb 4> [local count: 440234144]:
> >    ext (0);
> >
> >    <bb 5> [local count: 1073741824]:
> >    val ={v} {CLOBBER(eol)};
> >    return;
> >
> > }
> >
> > Here four bytes are being sequentially written into the SImode value
> > val.  On some platforms, such as MIPS64, this SImode value is kept in
> > a 64-bit register, suitably sign-extended.  The function expand_assignment
> > contains logic to handle this via SUBREG_PROMOTED_VAR_P (around line 6264
> > in expr.cc) which outputs an explicit extension operation after each
> > store_field (typically insv) to such promoted/extended pseudos.
> >
> > The first observation is that there's no need to perform sign extension
> > after each byte in the example above; the extension is only required
> > after changes to the most significant byte (i.e. to a field that overlaps
> > the most significant bit).
> True.
>
>
> >
> > The bug fix is actually a bit more subtle, but at this point during
> > code expansion it's not safe to use a SUBREG when sign-extending this
> > field.  Currently, GCC generates (sign_extend:DI (subreg:SI (reg:DI) 0))
> > but combine (and other RTL optimizers) later realize that because SImode
> > values are always sign-extended in their 64-bit hard registers that
> > this is a no-op and eliminates it.  The trouble is that it's unsafe to
> > refer to the SImode lowpart of a 64-bit register using SUBREG at those
> > critical points when temporarily the value isn't correctly sign-extended,
> > and the usual backend invariants don't hold.  At these critical points,
> > the middle-end needs to use an explicit TRUNCATE rtx (as this isn't a
> > TRULY_NOOP_TRUNCATION), so that the explicit sign-extension looks like
> > (sign_extend:DI (truncate:SI (reg:DI)), which avoids the problem.
>
>
> >
> > Note that MODE_REP_EXTENDED (NARROW, WIDE) != UNKOWN implies (or should
> > imply) !TRULY_NOOP_TRUNCATION (NARROW, WIDE).  I've another (independent)
> > patch that I'll post in a few minutes.
> >
> >
> > This middle-end patch has been tested on x86_64-pc-linux-gnu with
> > make bootstrap and make -k check, both with and without
> > --target_board=unix{-m32} with no new failures.  The cc1 from a
> > cross-compiler to mips64 appears to generate much better code for
> > the above test case.  Ok for mainline?
> >
> >
> > 2023-12-28  Roger Sayle  <ro...@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >          PR rtl-optimization/104914
> >          * expr.cc (expand_assignment): When target is SUBREG_PROMOTED_VAR_P
> >          a sign or zero extension is only required if the modified field
> >          overlaps the SUBREG's most significant bit.  On MODE_REP_EXTENDED
> >          targets, don't refer to the temporarily incorrectly extended value
> >          using a SUBREG, but instead generate an explicit TRUNCATE rtx.
> [ ... ]
>
>
> > +           /* Check if the field overlaps the MSB, requiring extension.  */
> > +           else if (known_eq (bitpos + bitsize,
> > +                              GET_MODE_BITSIZE (GET_MODE (to_rtx))))
> Do you need to look at the size of the field as well?  ie, the starting
> position might be before the sign bit, but the width of the field might
> cover the mode's sign bit?
>
> I'm not real good in the RTL expansion code, so if I'm offbase on this,
> just let me know.
>

I guess `known_ge` may be better.
Is `bitregion_end` here better than `bitpos + bitsize` ?

Reply via email to