Hi Jeff,
Thanks for the speedy review(s).

> From: Jeff Law <jeffreya...@gmail.com>
> Sent: 15 October 2023 00:03
> To: Roger Sayle <ro...@nextmovesoftware.com>; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in
> make_compound_operation.
> 
> On 10/14/23 16:14, Roger Sayle wrote:
> >
> > This patch is my proposed solution to PR rtl-optimization/91865.
> > Normally RTX simplification canonicalizes a ZERO_EXTEND of a
> > ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is
> > possible for combine's make_compound_operation to unintentionally
> > generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is
> > unlikely to be matched by the backend.
> >
> > For the new test case:
> >
> > const int table[2] = {1, 2};
> > int foo (char i) { return table[i]; }
> >
> > compiling with -O2 -mlarge on msp430 we currently see:
> >
> > Trying 2 -> 7:
> >      2: r25:HI=zero_extend(R12:QI)
> >        REG_DEAD R12:QI
> >      7: r28:PSI=sign_extend(r25:HI)#0
> >        REG_DEAD r25:HI
> > Failed to match this instruction:
> > (set (reg:PSI 28 [ iD.1772 ])
> >      (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
> >
> > which results in the following code:
> >
> > foo:    AND     #0xff, R12
> >          RLAM.A #4, R12 { RRAM.A #4, R12
> >          RLAM.A  #1, R12
> >          MOVX.W  table(R12), R12
> >          RETA
> >
> > With this patch, we now see:
> >
> > Trying 2 -> 7:
> >      2: r25:HI=zero_extend(R12:QI)
> >        REG_DEAD R12:QI
> >      7: r28:PSI=sign_extend(r25:HI)#0
> >        REG_DEAD r25:HI
> > Successfully matched this instruction:
> > (set (reg:PSI 28 [ iD.1772 ])
> >      (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing
> > combination of insns 2 and 7 original costs 4 + 8 = 12 replacement
> > cost 8
> >
> > foo:    MOV.B   R12, R12
> >          RLAM.A  #1, R12
> >          MOVX.W  table(R12), R12
> >          RETA
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> > 2023-10-14  Roger Sayle  <ro...@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >          PR rtl-optimization/91865
> >          * combine.cc (make_compound_operation): Avoid creating a
> >          ZERO_EXTEND of a ZERO_EXTEND.
> >
> > gcc/testsuite/ChangeLog
> >          PR rtl-optimization/91865
> >          * gcc.target/msp430/pr91865.c: New test case.
> Neither an ACK or NAK at this point.
> 
> The bug report includes a patch from Segher which purports to fix this in 
> simplify-
> rtx.  Any thoughts on Segher's approach and whether or not it should be
> considered?
> 
> The BZ also indicates that removal of 2 patterns from msp430.md would solve 
> this
> too (though it may cause regressions elsewhere?).  Any thoughts on that 
> approach
> as well?
> 

Great questions.  I believe Segher's proposed patch (in comment #4) was an
msp430-specific proof-of-concept workaround rather than intended to be fix.
Eliminating a ZERO_EXTEND simply by changing the mode of a hard register
is not a solution that'll work on many platforms (and therefore not really 
suitable
for target-independent middle-end code in the RTL optimizers).

For example, zero_extend:TI (and:QI (reg:QI hard_r1) (const_int 0x0f)) can't
universally be reduced to (and:TI (reg:TI hard_r1) (const_int 0x0f)).  Notice 
that
Segher's code doesn't check TARGET_HARD_REGNO_MODE_OK or 
TARGET_MODES_TIEABLE_P or any of the other backend hooks necessary
to confirm such a transformation is safe/possible.

Secondly, the hard register aspect is a bit of a red herring.  This work-around
fixes the issue in the original BZ description, but not the slightly modified 
test
case in comment #2 (with a global variable).  This doesn't have a hard register,
but does have the dubious ZERO_EXTEND/SIGN_EXTEND of a ZERO_EXTEND.

The underlying issue, which is applicable to all targets, is that combine.cc's
make_compound_operation is expected to reverse the local transformations
made by expand_compound_operation.  Hence, if an RTL expression is
canonical going into expand_compound_operation, it is expected (hoped)
to be canonical (and equivalent) coming out of make_compound_operation.

Hence, rather than be a MSP430 specific issue, no target should expect (or
be expected to see) a ZERO_EXTEND of a ZERO_EXTEND, or a SIGN_EXTEND
of a ZERO_EXTEND in the RTL stream.  Much like a binary operator with two
CONST_INT operands, or a shift by zero, it's something the middle-end might
reasonably be expected to clean-up. [Yeah, I know... 😊]

However, if it isn't considered a problem that make_compound_operand and
simplify_rtx generate different forms of the same expression, requiring the
backend to match multiple patterns (some for the combine pass, and others
for other RTL passes), there is a simple fix for MSP430; Just add a pattern 
that matches (the slightly odd):

> > (set (reg:PSI 28 [ iD.1772 ])
> >      (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

As a rule of thumb, if the missed optimization bug report includes combine's
diagnostic "Failed to match this instruction:", things can be improved by adding
a pattern (often a define_insn_and_split) that matches the shown RTL.

In this case the perhaps reasonable assumption is that the above would/should
(normally) match the backend's existing (zero_extend:PSI (reg:QI ...)) insn 
pattern.
Or that's my understanding of why this PR is classified as rtl-optimization and
not target.

Finally, my patch shouldn't block a (updated corrected) variant of Segher's 
patch
or other solution to PR 91865.  The more (safe) solutions the merrier.

Thanks again.


Reply via email to