Jeff Law <jeffreya...@gmail.com> 于2023年12月29日周五 02:23写道: > > > > On 12/28/23 07:59, Roger Sayle wrote: > > > > This patch fixes PR rtl-optmization/104914 by tweaking/improving the way > > that fields are written into a pseudo register that needs to be kept sign > > extended. > Well, I think "fixes" is a bit of a stretch. We're avoiding the issue > by changing the early RTL generation, but if I understand what's going > on in the RTL optimizers and MIPS backend correctly, the core bug still > remains. Admittedly I haven't put it under a debugger, but that MIPS > definition of NOOP_TRUNCATION just seems badly wrong and is just waiting > to pop it's ugly head up again. >
Yes. I am trying to get rid of it from MIPS64. It may reduce our maintain workload. > > > > > > The motivating example from the bugzilla PR is: > > > > extern void ext(int); > > void foo(const unsigned char *buf) { > > int val; > > ((unsigned char*)&val)[0] = *buf++; > > ((unsigned char*)&val)[1] = *buf++; > > ((unsigned char*)&val)[2] = *buf++; > > ((unsigned char*)&val)[3] = *buf++; > > if(val > 0) > > ext(1); > > else > > ext(0); > > } > > > > which at the end of the tree optimization passes looks like: > > > > void foo (const unsigned char * buf) > > { > > int val; > > unsigned char _1; > > unsigned char _2; > > unsigned char _3; > > unsigned char _4; > > int val.5_5; > > > > <bb 2> [local count: 1073741824]: > > _1 = *buf_7(D); > > MEM[(unsigned char *)&val] = _1; > > _2 = MEM[(const unsigned char *)buf_7(D) + 1B]; > > MEM[(unsigned char *)&val + 1B] = _2; > > _3 = MEM[(const unsigned char *)buf_7(D) + 2B]; > > MEM[(unsigned char *)&val + 2B] = _3; > > _4 = MEM[(const unsigned char *)buf_7(D) + 3B]; > > MEM[(unsigned char *)&val + 3B] = _4; > > val.5_5 = val; > > if (val.5_5 > 0) > > goto <bb 3>; [59.00%] > > else > > goto <bb 4>; [41.00%] > > > > <bb 3> [local count: 633507681]: > > ext (1); > > goto <bb 5>; [100.00%] > > > > <bb 4> [local count: 440234144]: > > ext (0); > > > > <bb 5> [local count: 1073741824]: > > val ={v} {CLOBBER(eol)}; > > return; > > > > } > > > > Here four bytes are being sequentially written into the SImode value > > val. On some platforms, such as MIPS64, this SImode value is kept in > > a 64-bit register, suitably sign-extended. The function expand_assignment > > contains logic to handle this via SUBREG_PROMOTED_VAR_P (around line 6264 > > in expr.cc) which outputs an explicit extension operation after each > > store_field (typically insv) to such promoted/extended pseudos. > > > > The first observation is that there's no need to perform sign extension > > after each byte in the example above; the extension is only required > > after changes to the most significant byte (i.e. to a field that overlaps > > the most significant bit). > True. > > > > > > The bug fix is actually a bit more subtle, but at this point during > > code expansion it's not safe to use a SUBREG when sign-extending this > > field. Currently, GCC generates (sign_extend:DI (subreg:SI (reg:DI) 0)) > > but combine (and other RTL optimizers) later realize that because SImode > > values are always sign-extended in their 64-bit hard registers that > > this is a no-op and eliminates it. The trouble is that it's unsafe to > > refer to the SImode lowpart of a 64-bit register using SUBREG at those > > critical points when temporarily the value isn't correctly sign-extended, > > and the usual backend invariants don't hold. At these critical points, > > the middle-end needs to use an explicit TRUNCATE rtx (as this isn't a > > TRULY_NOOP_TRUNCATION), so that the explicit sign-extension looks like > > (sign_extend:DI (truncate:SI (reg:DI)), which avoids the problem. > > > > > > Note that MODE_REP_EXTENDED (NARROW, WIDE) != UNKOWN implies (or should > > imply) !TRULY_NOOP_TRUNCATION (NARROW, WIDE). I've another (independent) > > patch that I'll post in a few minutes. > > > > > > This middle-end patch has been tested on x86_64-pc-linux-gnu with > > make bootstrap and make -k check, both with and without > > --target_board=unix{-m32} with no new failures. The cc1 from a > > cross-compiler to mips64 appears to generate much better code for > > the above test case. Ok for mainline? > > > > > > 2023-12-28 Roger Sayle <ro...@nextmovesoftware.com> > > > > gcc/ChangeLog > > PR rtl-optimization/104914 > > * expr.cc (expand_assignment): When target is SUBREG_PROMOTED_VAR_P > > a sign or zero extension is only required if the modified field > > overlaps the SUBREG's most significant bit. On MODE_REP_EXTENDED > > targets, don't refer to the temporarily incorrectly extended value > > using a SUBREG, but instead generate an explicit TRUNCATE rtx. > [ ... ] > > > > + /* Check if the field overlaps the MSB, requiring extension. */ > > + else if (known_eq (bitpos + bitsize, > > + GET_MODE_BITSIZE (GET_MODE (to_rtx)))) > Do you need to look at the size of the field as well? ie, the starting > position might be before the sign bit, but the width of the field might > cover the mode's sign bit? > > I'm not real good in the RTL expansion code, so if I'm offbase on this, > just let me know. > I guess `known_ge` may be better. Is `bitregion_end` here better than `bitpos + bitsize` ?