This patch fixes PR rtl-optmization/104914 by tweaking/improving the way that fields are written into a pseudo register that needs to be kept sign extended.
The motivating example from the bugzilla PR is: extern void ext(int); void foo(const unsigned char *buf) { int val; ((unsigned char*)&val)[0] = *buf++; ((unsigned char*)&val)[1] = *buf++; ((unsigned char*)&val)[2] = *buf++; ((unsigned char*)&val)[3] = *buf++; if(val > 0) ext(1); else ext(0); } which at the end of the tree optimization passes looks like: void foo (const unsigned char * buf) { int val; unsigned char _1; unsigned char _2; unsigned char _3; unsigned char _4; int val.5_5; <bb 2> [local count: 1073741824]: _1 = *buf_7(D); MEM[(unsigned char *)&val] = _1; _2 = MEM[(const unsigned char *)buf_7(D) + 1B]; MEM[(unsigned char *)&val + 1B] = _2; _3 = MEM[(const unsigned char *)buf_7(D) + 2B]; MEM[(unsigned char *)&val + 2B] = _3; _4 = MEM[(const unsigned char *)buf_7(D) + 3B]; MEM[(unsigned char *)&val + 3B] = _4; val.5_5 = val; if (val.5_5 > 0) goto <bb 3>; [59.00%] else goto <bb 4>; [41.00%] <bb 3> [local count: 633507681]: ext (1); goto <bb 5>; [100.00%] <bb 4> [local count: 440234144]: ext (0); <bb 5> [local count: 1073741824]: val ={v} {CLOBBER(eol)}; return; } Here four bytes are being sequentially written into the SImode value val. On some platforms, such as MIPS64, this SImode value is kept in a 64-bit register, suitably sign-extended. The function expand_assignment contains logic to handle this via SUBREG_PROMOTED_VAR_P (around line 6264 in expr.cc) which outputs an explicit extension operation after each store_field (typically insv) to such promoted/extended pseudos. The first observation is that there's no need to perform sign extension after each byte in the example above; the extension is only required after changes to the most significant byte (i.e. to a field that overlaps the most significant bit). The bug fix is actually a bit more subtle, but at this point during code expansion it's not safe to use a SUBREG when sign-extending this field. Currently, GCC generates (sign_extend:DI (subreg:SI (reg:DI) 0)) but combine (and other RTL optimizers) later realize that because SImode values are always sign-extended in their 64-bit hard registers that this is a no-op and eliminates it. The trouble is that it's unsafe to refer to the SImode lowpart of a 64-bit register using SUBREG at those critical points when temporarily the value isn't correctly sign-extended, and the usual backend invariants don't hold. At these critical points, the middle-end needs to use an explicit TRUNCATE rtx (as this isn't a TRULY_NOOP_TRUNCATION), so that the explicit sign-extension looks like (sign_extend:DI (truncate:SI (reg:DI)), which avoids the problem. Note that MODE_REP_EXTENDED (NARROW, WIDE) != UNKOWN implies (or should imply) !TRULY_NOOP_TRUNCATION (NARROW, WIDE). I've another (independent) patch that I'll post in a few minutes. This middle-end patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. The cc1 from a cross-compiler to mips64 appears to generate much better code for the above test case. Ok for mainline? 2023-12-28 Roger Sayle <ro...@nextmovesoftware.com> gcc/ChangeLog PR rtl-optimization/104914 * expr.cc (expand_assignment): When target is SUBREG_PROMOTED_VAR_P a sign or zero extension is only required if the modified field overlaps the SUBREG's most significant bit. On MODE_REP_EXTENDED targets, don't refer to the temporarily incorrectly extended value using a SUBREG, but instead generate an explicit TRUNCATE rtx. Thanks in advance, Roger --
diff --git a/gcc/expr.cc b/gcc/expr.cc index 9fef2bf6585..1a34b48e38f 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -6272,19 +6272,32 @@ expand_assignment (tree to, tree from, bool nontemporal) && known_eq (bitpos, 0) && known_eq (bitsize, GET_MODE_BITSIZE (GET_MODE (to_rtx)))) result = store_expr (from, to_rtx, 0, nontemporal, false); - else + /* Check if the field overlaps the MSB, requiring extension. */ + else if (known_eq (bitpos + bitsize, + GET_MODE_BITSIZE (GET_MODE (to_rtx)))) { - rtx to_rtx1 - = lowpart_subreg (subreg_unpromoted_mode (to_rtx), - SUBREG_REG (to_rtx), - subreg_promoted_mode (to_rtx)); + scalar_int_mode imode = subreg_unpromoted_mode (to_rtx); + scalar_int_mode omode = subreg_promoted_mode (to_rtx); + rtx to_rtx1 = lowpart_subreg (imode, SUBREG_REG (to_rtx), + omode); result = store_field (to_rtx1, bitsize, bitpos, bitregion_start, bitregion_end, mode1, from, get_alias_set (to), nontemporal, reversep); + /* If the target usually keeps IMODE appropriately + extended in OMODE it's unsafe to refer to it using + a SUBREG whilst this invariant doesn't hold. */ + if (targetm.mode_rep_extended (imode, omode) != UNKNOWN) + to_rtx1 = simplify_gen_unary (TRUNCATE, imode, + SUBREG_REG (to_rtx), omode); convert_move (SUBREG_REG (to_rtx), to_rtx1, SUBREG_PROMOTED_SIGN (to_rtx)); } + else + result = store_field (to_rtx, bitsize, bitpos, + bitregion_start, bitregion_end, + mode1, from, get_alias_set (to), + nontemporal, reversep); } else result = store_field (to_rtx, bitsize, bitpos,