On Thu, Jul 4, 2024 at 11:24 AM Richard Biener
<richard.guent...@gmail.com> wrote:
>
> On Wed, Jul 3, 2024 at 9:26 PM Georg-Johann Lay <a...@gjlay.de> wrote:
> >
> >
> >
> > Am 02.07.24 um 15:48 schrieb Richard Biener:
> > > On Tue, Jul 2, 2024 at 3:43 PM Georg-Johann Lay <a...@gjlay.de> wrote:
> > >>
> > >> Hi Jeff,
> > >>
> > >> This is a patch to get correct code out of 64-bit
> > >> loads from address-space __memx.
> > >>
> > >> The AVR address-spaces may require that move insns issue
> > >> calls to library support functions, a fact that -ftree-ter
> > >> doesn't account for.  tree-ssa-ter.cc then replaces an
> > >> expression across such a library call, resulting in wrong code.
> > >>
> > >> This patch disables that pass per default on avr, as there is no
> > >> more fine grained way to avoid malicious optimizations.
> > >> The pass can still be re-enabled by means of explicit -ftree-ter.
> > >>
> > >> Ok to apply?
> > >
> > > I think this requires more details on what goes wrong - I assume
> > > it's not stmt reordering that effectively happens but recursive
> > > expand_expr on SSA defs when those invoke libcalls?  In that
> > > case this would point to a deeper issue.
> >
> > The difference is that with TER, we get a hard reg in .expand
> > for a movdi from 24-bit address-space __memx.
> >
> > Such moves require library calls, which in turn require
> > specific hard registers.  As avr backend has no movdi, the
> > moddi gets expanded as 8 * movqi, and that does not work
> > when the target registers are hard regs, as some of them
> > are clobbered by the libcalls.
>
> So I see
>
> (insn 18 17 19 2 (parallel [
>             (set (reg:QI 22 r22 [+4 ])
>                 (mem/u/c:QI (reg/f:PSI 55) [1 aa+4 S1 A8 AS7]))
>             (clobber (reg:QI 22 r22))
>             (clobber (reg:QI 21 r21))
>             (clobber (reg:HI 30 r30))
>         ]) "t.c":12:13 38 {xloadqi_A}
>      (nil))
> (insn 19 18 20 2 (set (reg:PSI 56)
>         (reg/f:PSI 47)) "t.c":12:13 112 {*movpsi_split}
>      (nil))
> (insn 20 19 21 2 (parallel [
>             (set (reg/f:PSI 57)
>                 (plus:PSI (reg/f:PSI 47)
>                     (const_int 5 [0x5])))
>             (clobber (scratch:QI))
>         ]) "t.c":12:13 205 {addpsi3}
>      (nil))
> (insn 21 20 22 2 (parallel [
>             (set (reg:QI 23 r23 [+5 ])
>                 (mem/u/c:QI (reg/f:PSI 57) [1 aa+5 S1 A8 AS7]))
>             (clobber (reg:QI 22 r22))
>             (clobber (reg:QI 21 r21))
>             (clobber (reg:HI 30 r30))
>         ]) "t.c":12:13 38 {xloadqi_A}
>      (nil))
>
> for example - insn 21 clobbers r22 which is also the destination of insn 18.
>
> With -fno-tree-ter those oddly get _no_ intermediate reg but we have
>
> (insn 9 8 10 2 (parallel [
>             (set (subreg:QI (reg:DI 43 [ aa.0_1 ]) 1)
>                 (mem/u/c:QI (reg/f:PSI 48) [1 aa+1 S1 A8 AS7]))
>             (clobber (reg:QI 22 r22))
>             (clobber (reg:QI 21 r21))
>             (clobber (reg:HI 30 r30))
>         ]) "t.c":12:13 38 {xloadqi_A}
>      (nil))
>
> but since on GIMPLE we have DImode loads I don't see how TER comes into
> play here - TER should favor the second code generation, not the first ...
> (or TER shouldn't play into here at all).
>
> with -fno-tree-ter we come via
>
> #0  expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x7ffff716c9a8,
>     tmode=E_DImode, modifier=EXPAND_NORMAL, alt_rtl=0x7fffffffcff8,
>     inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433
> #1  0x000000000109fe63 in store_expr (exp=<var_decl 0x7ffff7162000 aa>,
>     target=0x7ffff716c9a8, call_param_p=0, nontemporal=false, reverse=false)
>     at /space/rguenther/src/gcc/gcc/expr.cc:6740
> #2  0x000000000109e626 in expand_assignment (to=<ssa_name 0x7ffff713f678 1>,
>     from=<var_decl 0x7ffff7162000 aa>, nontemporal=false)
>     at /space/rguenther/src/gcc/gcc/expr.cc:6461
>
> while with TER we instead have
>
> #0  expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x0,
>     tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0,
>     inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433
> #1  0x00000000010b279f in expand_expr_real_gassign (g=0x7ffff71613c0,
>     target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0,
>     inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11100
> #2  0x00000000010b3294 in expand_expr_real_1 (exp=<ssa_name 0x7ffff713f678 1>,
>     target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0,
>     inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11278
>
> the difference is -fno-tree-ter has a target (reg:DI 43 [...]) but with TER we
> are not passing a target or a mode.
>
> I think the issue is that the expansion at some point doesn't expect
> the result to end up in
> a hard register.  Maybe define_expand are not supposed to do that but maybe
> expansion needs to fix up.
>
> A first thought was
>
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index ffbac513692..1509acad02e 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -11111,6 +11111,12 @@ expand_expr_real_gassign (gassign *g, rtx
> target, machine_mode tmode,
>        gcc_unreachable ();
>      }
>    set_curr_insn_location (saved_loc);
> +  if (!target && REG_P (r) && REGNO (r) < FIRST_PSEUDO_REGISTER)
> +    {
> +      rtx tem = gen_reg_rtx (GET_MODE (r));
> +      emit_move_insn (tem, r);
> +      r = tem;
> +    }
>    if (REG_P (r) && !REG_EXPR (r))
>      set_reg_attrs_for_decl_rtl (lhs, r);
>    return r;
>
> but of course that's not the place to fix - this sees (mem/u/c:DI
> (reg/f:PSI 47) [1 aa+0 S8 A8 AS7])
> as result and things go wrong somewhere in the chain of expanding
> things from the return, possibly at the point of expanding the plus and
> there possibly when building subregs of the DImode mem.
>
> You'd have to trace that down but the fix in the end is to do sth like the 
> above
> or alternatively, in the expander producing the 'xload' make sure to not allow
> a hardreg as destination when you can still create pseudos?

diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index dabf4c0fc5a..f897113c885 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -746,9 +746,7 @@
         else
           {
             rtx reg_22 = gen_rtx_REG (<MODE>mode, REG_22);
-            if (reg_overlap_mentioned_p (dest2, reg_22)
-                || reg_overlap_mentioned_p (dest2, all_regs_rtx[REG_21]))
-              dest2 = gen_reg_rtx (<MODE>mode);
+            dest2 = gen_reg_rtx (<MODE>mode);

             emit_insn (gen_xload<mode>_A (dest2, src));
           }

seems to fix it.  I'm not sure what reg_overlap_mentioned_p should achieve in
a define-expand.

Richard.

> Richard.
>
> > Moreover, even with TER, the code is no more efficient than
> > without it, so it's not clear what's the point in propagating
> > hard regs into expander operands. Later passes like fwprop1 and
> > combine can do that, too.
> >
> > Requiring libcalls in a mov insn is quite special indeed,
> > and there is no way to tell that to TER.  TER itself does not
> > optimize code involving libcalls, so it knows they are fragile.
> >
> > > So - if the wrongness is already apparent in the RTL expansion
> > > pass dump can you quote the respective pieces and explain why?
> >
> > It expand a 64-bit move from __memx address-space to registers
> > R18...R25.  This is broken into 8 QI moves to these regs, but
> > the movqi requires a libcall in some situations, which pass their
> > arguments in R21...R25.  Hence the libcalls clobber some of the
> > destination regs.
> >
> > It would already help when TER would not propagate hard-regs into
> > expander operands.
> >
> > Johann
> >
> > >> As an alternative, the option could be disabled permanently in
> > >> avr.cc::avr_option_override().
> > >>
> > >> Johann
> > >>
> > >> --
> > >>
> > >> AVR: middle-end/87376 - Use -fno-tree-ter per default.
> > >>
> > >> Temporary expression replacement might replace expressions across
> > >> library calls, for example with move insn from address-space __memx
> > >> like in PR87376.  -ftree-ter has no way where the backend could hook
> > >> in to avoid only problematic replacements, thus kick it out altogether.
> > >>
> > >>          PR middle-end/87376
> > >> gcc/
> > >>          * common/config/avr/avr-common.cc 
> > >> (avr_option_optimization_table)
> > >>          <OPT_ftree_ter>: Set to 0.
> > >> gcc/testsuite/
> > >>          * gcc.target/avr/torture/pr87376-memx.c: New test.

Reply via email to