This patch fixes PR rtl-optmization/104914 by tweaking/improving the way
that fields are written into a pseudo register that needs to be kept sign
extended.

The motivating example from the bugzilla PR is:

extern void ext(int);
void foo(const unsigned char *buf) {
  int val;
  ((unsigned char*)&val)[0] = *buf++;
  ((unsigned char*)&val)[1] = *buf++;
  ((unsigned char*)&val)[2] = *buf++;
  ((unsigned char*)&val)[3] = *buf++;
  if(val > 0)
    ext(1);
  else
    ext(0);
}

which at the end of the tree optimization passes looks like:

void foo (const unsigned char * buf)
{
  int val;
  unsigned char _1;
  unsigned char _2;
  unsigned char _3;
  unsigned char _4;
  int val.5_5;

  <bb 2> [local count: 1073741824]:
  _1 = *buf_7(D);
  MEM[(unsigned char *)&val] = _1;
  _2 = MEM[(const unsigned char *)buf_7(D) + 1B];
  MEM[(unsigned char *)&val + 1B] = _2;
  _3 = MEM[(const unsigned char *)buf_7(D) + 2B];
  MEM[(unsigned char *)&val + 2B] = _3;
  _4 = MEM[(const unsigned char *)buf_7(D) + 3B];
  MEM[(unsigned char *)&val + 3B] = _4;
  val.5_5 = val;
  if (val.5_5 > 0)
    goto <bb 3>; [59.00%]
  else
    goto <bb 4>; [41.00%]

  <bb 3> [local count: 633507681]:
  ext (1);
  goto <bb 5>; [100.00%]

  <bb 4> [local count: 440234144]:
  ext (0);

  <bb 5> [local count: 1073741824]:
  val ={v} {CLOBBER(eol)};
  return;

}

Here four bytes are being sequentially written into the SImode value
val.  On some platforms, such as MIPS64, this SImode value is kept in
a 64-bit register, suitably sign-extended.  The function expand_assignment
contains logic to handle this via SUBREG_PROMOTED_VAR_P (around line 6264
in expr.cc) which outputs an explicit extension operation after each
store_field (typically insv) to such promoted/extended pseudos.

The first observation is that there's no need to perform sign extension
after each byte in the example above; the extension is only required
after changes to the most significant byte (i.e. to a field that overlaps
the most significant bit).

The bug fix is actually a bit more subtle, but at this point during
code expansion it's not safe to use a SUBREG when sign-extending this
field.  Currently, GCC generates (sign_extend:DI (subreg:SI (reg:DI) 0))
but combine (and other RTL optimizers) later realize that because SImode
values are always sign-extended in their 64-bit hard registers that
this is a no-op and eliminates it.  The trouble is that it's unsafe to
refer to the SImode lowpart of a 64-bit register using SUBREG at those
critical points when temporarily the value isn't correctly sign-extended,
and the usual backend invariants don't hold.  At these critical points,
the middle-end needs to use an explicit TRUNCATE rtx (as this isn't a
TRULY_NOOP_TRUNCATION), so that the explicit sign-extension looks like
(sign_extend:DI (truncate:SI (reg:DI)), which avoids the problem.

Note that MODE_REP_EXTENDED (NARROW, WIDE) != UNKOWN implies (or should
imply) !TRULY_NOOP_TRUNCATION (NARROW, WIDE).  I've another (independent)
patch that I'll post in a few minutes.


This middle-end patch has been tested on x86_64-pc-linux-gnu with
make bootstrap and make -k check, both with and without
--target_board=unix{-m32} with no new failures.  The cc1 from a
cross-compiler to mips64 appears to generate much better code for
the above test case.  Ok for mainline?


2023-12-28  Roger Sayle  <ro...@nextmovesoftware.com>

gcc/ChangeLog
        PR rtl-optimization/104914
        * expr.cc (expand_assignment): When target is SUBREG_PROMOTED_VAR_P
        a sign or zero extension is only required if the modified field
        overlaps the SUBREG's most significant bit.  On MODE_REP_EXTENDED
        targets, don't refer to the temporarily incorrectly extended value
        using a SUBREG, but instead generate an explicit TRUNCATE rtx.


Thanks in advance,
Roger
--

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 9fef2bf6585..1a34b48e38f 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -6272,19 +6272,32 @@ expand_assignment (tree to, tree from, bool nontemporal)
                  && known_eq (bitpos, 0)
                  && known_eq (bitsize, GET_MODE_BITSIZE (GET_MODE (to_rtx))))
                result = store_expr (from, to_rtx, 0, nontemporal, false);
-             else
+             /* Check if the field overlaps the MSB, requiring extension.  */
+             else if (known_eq (bitpos + bitsize,
+                                GET_MODE_BITSIZE (GET_MODE (to_rtx))))
                {
-                 rtx to_rtx1
-                   = lowpart_subreg (subreg_unpromoted_mode (to_rtx),
-                                     SUBREG_REG (to_rtx),
-                                     subreg_promoted_mode (to_rtx));
+                 scalar_int_mode imode = subreg_unpromoted_mode (to_rtx);
+                 scalar_int_mode omode = subreg_promoted_mode (to_rtx);
+                 rtx to_rtx1 = lowpart_subreg (imode, SUBREG_REG (to_rtx),
+                                               omode);
                  result = store_field (to_rtx1, bitsize, bitpos,
                                        bitregion_start, bitregion_end,
                                        mode1, from, get_alias_set (to),
                                        nontemporal, reversep);
+                 /* If the target usually keeps IMODE appropriately
+                    extended in OMODE it's unsafe to refer to it using
+                    a SUBREG whilst this invariant doesn't hold.  */
+                 if (targetm.mode_rep_extended (imode, omode) != UNKNOWN)
+                   to_rtx1 = simplify_gen_unary (TRUNCATE, imode,
+                                                 SUBREG_REG (to_rtx), omode);
                  convert_move (SUBREG_REG (to_rtx), to_rtx1,
                                SUBREG_PROMOTED_SIGN (to_rtx));
                }
+             else
+               result = store_field (to_rtx, bitsize, bitpos,
+                                     bitregion_start, bitregion_end,
+                                     mode1, from, get_alias_set (to),
+                                     nontemporal, reversep);
            }
          else
            result = store_field (to_rtx, bitsize, bitpos,

Reply via email to