On Thu, Jul 4, 2024 at 11:24 AM Richard Biener <richard.guent...@gmail.com> wrote: > > On Wed, Jul 3, 2024 at 9:26 PM Georg-Johann Lay <a...@gjlay.de> wrote: > > > > > > > > Am 02.07.24 um 15:48 schrieb Richard Biener: > > > On Tue, Jul 2, 2024 at 3:43 PM Georg-Johann Lay <a...@gjlay.de> wrote: > > >> > > >> Hi Jeff, > > >> > > >> This is a patch to get correct code out of 64-bit > > >> loads from address-space __memx. > > >> > > >> The AVR address-spaces may require that move insns issue > > >> calls to library support functions, a fact that -ftree-ter > > >> doesn't account for. tree-ssa-ter.cc then replaces an > > >> expression across such a library call, resulting in wrong code. > > >> > > >> This patch disables that pass per default on avr, as there is no > > >> more fine grained way to avoid malicious optimizations. > > >> The pass can still be re-enabled by means of explicit -ftree-ter. > > >> > > >> Ok to apply? > > > > > > I think this requires more details on what goes wrong - I assume > > > it's not stmt reordering that effectively happens but recursive > > > expand_expr on SSA defs when those invoke libcalls? In that > > > case this would point to a deeper issue. > > > > The difference is that with TER, we get a hard reg in .expand > > for a movdi from 24-bit address-space __memx. > > > > Such moves require library calls, which in turn require > > specific hard registers. As avr backend has no movdi, the > > moddi gets expanded as 8 * movqi, and that does not work > > when the target registers are hard regs, as some of them > > are clobbered by the libcalls. > > So I see > > (insn 18 17 19 2 (parallel [ > (set (reg:QI 22 r22 [+4 ]) > (mem/u/c:QI (reg/f:PSI 55) [1 aa+4 S1 A8 AS7])) > (clobber (reg:QI 22 r22)) > (clobber (reg:QI 21 r21)) > (clobber (reg:HI 30 r30)) > ]) "t.c":12:13 38 {xloadqi_A} > (nil)) > (insn 19 18 20 2 (set (reg:PSI 56) > (reg/f:PSI 47)) "t.c":12:13 112 {*movpsi_split} > (nil)) > (insn 20 19 21 2 (parallel [ > (set (reg/f:PSI 57) > (plus:PSI (reg/f:PSI 47) > (const_int 5 [0x5]))) > (clobber (scratch:QI)) > ]) "t.c":12:13 205 {addpsi3} > (nil)) > (insn 21 20 22 2 (parallel [ > (set (reg:QI 23 r23 [+5 ]) > (mem/u/c:QI (reg/f:PSI 57) [1 aa+5 S1 A8 AS7])) > (clobber (reg:QI 22 r22)) > (clobber (reg:QI 21 r21)) > (clobber (reg:HI 30 r30)) > ]) "t.c":12:13 38 {xloadqi_A} > (nil)) > > for example - insn 21 clobbers r22 which is also the destination of insn 18. > > With -fno-tree-ter those oddly get _no_ intermediate reg but we have > > (insn 9 8 10 2 (parallel [ > (set (subreg:QI (reg:DI 43 [ aa.0_1 ]) 1) > (mem/u/c:QI (reg/f:PSI 48) [1 aa+1 S1 A8 AS7])) > (clobber (reg:QI 22 r22)) > (clobber (reg:QI 21 r21)) > (clobber (reg:HI 30 r30)) > ]) "t.c":12:13 38 {xloadqi_A} > (nil)) > > but since on GIMPLE we have DImode loads I don't see how TER comes into > play here - TER should favor the second code generation, not the first ... > (or TER shouldn't play into here at all). > > with -fno-tree-ter we come via > > #0 expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x7ffff716c9a8, > tmode=E_DImode, modifier=EXPAND_NORMAL, alt_rtl=0x7fffffffcff8, > inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433 > #1 0x000000000109fe63 in store_expr (exp=<var_decl 0x7ffff7162000 aa>, > target=0x7ffff716c9a8, call_param_p=0, nontemporal=false, reverse=false) > at /space/rguenther/src/gcc/gcc/expr.cc:6740 > #2 0x000000000109e626 in expand_assignment (to=<ssa_name 0x7ffff713f678 1>, > from=<var_decl 0x7ffff7162000 aa>, nontemporal=false) > at /space/rguenther/src/gcc/gcc/expr.cc:6461 > > while with TER we instead have > > #0 expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x0, > tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, > inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433 > #1 0x00000000010b279f in expand_expr_real_gassign (g=0x7ffff71613c0, > target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, > inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11100 > #2 0x00000000010b3294 in expand_expr_real_1 (exp=<ssa_name 0x7ffff713f678 1>, > target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, > inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11278 > > the difference is -fno-tree-ter has a target (reg:DI 43 [...]) but with TER we > are not passing a target or a mode. > > I think the issue is that the expansion at some point doesn't expect > the result to end up in > a hard register. Maybe define_expand are not supposed to do that but maybe > expansion needs to fix up. > > A first thought was > > diff --git a/gcc/expr.cc b/gcc/expr.cc > index ffbac513692..1509acad02e 100644 > --- a/gcc/expr.cc > +++ b/gcc/expr.cc > @@ -11111,6 +11111,12 @@ expand_expr_real_gassign (gassign *g, rtx > target, machine_mode tmode, > gcc_unreachable (); > } > set_curr_insn_location (saved_loc); > + if (!target && REG_P (r) && REGNO (r) < FIRST_PSEUDO_REGISTER) > + { > + rtx tem = gen_reg_rtx (GET_MODE (r)); > + emit_move_insn (tem, r); > + r = tem; > + } > if (REG_P (r) && !REG_EXPR (r)) > set_reg_attrs_for_decl_rtl (lhs, r); > return r; > > but of course that's not the place to fix - this sees (mem/u/c:DI > (reg/f:PSI 47) [1 aa+0 S8 A8 AS7]) > as result and things go wrong somewhere in the chain of expanding > things from the return, possibly at the point of expanding the plus and > there possibly when building subregs of the DImode mem. > > You'd have to trace that down but the fix in the end is to do sth like the > above > or alternatively, in the expander producing the 'xload' make sure to not allow > a hardreg as destination when you can still create pseudos?
diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md index dabf4c0fc5a..f897113c885 100644 --- a/gcc/config/avr/avr.md +++ b/gcc/config/avr/avr.md @@ -746,9 +746,7 @@ else { rtx reg_22 = gen_rtx_REG (<MODE>mode, REG_22); - if (reg_overlap_mentioned_p (dest2, reg_22) - || reg_overlap_mentioned_p (dest2, all_regs_rtx[REG_21])) - dest2 = gen_reg_rtx (<MODE>mode); + dest2 = gen_reg_rtx (<MODE>mode); emit_insn (gen_xload<mode>_A (dest2, src)); } seems to fix it. I'm not sure what reg_overlap_mentioned_p should achieve in a define-expand. Richard. > Richard. > > > Moreover, even with TER, the code is no more efficient than > > without it, so it's not clear what's the point in propagating > > hard regs into expander operands. Later passes like fwprop1 and > > combine can do that, too. > > > > Requiring libcalls in a mov insn is quite special indeed, > > and there is no way to tell that to TER. TER itself does not > > optimize code involving libcalls, so it knows they are fragile. > > > > > So - if the wrongness is already apparent in the RTL expansion > > > pass dump can you quote the respective pieces and explain why? > > > > It expand a 64-bit move from __memx address-space to registers > > R18...R25. This is broken into 8 QI moves to these regs, but > > the movqi requires a libcall in some situations, which pass their > > arguments in R21...R25. Hence the libcalls clobber some of the > > destination regs. > > > > It would already help when TER would not propagate hard-regs into > > expander operands. > > > > Johann > > > > >> As an alternative, the option could be disabled permanently in > > >> avr.cc::avr_option_override(). > > >> > > >> Johann > > >> > > >> -- > > >> > > >> AVR: middle-end/87376 - Use -fno-tree-ter per default. > > >> > > >> Temporary expression replacement might replace expressions across > > >> library calls, for example with move insn from address-space __memx > > >> like in PR87376. -ftree-ter has no way where the backend could hook > > >> in to avoid only problematic replacements, thus kick it out altogether. > > >> > > >> PR middle-end/87376 > > >> gcc/ > > >> * common/config/avr/avr-common.cc > > >> (avr_option_optimization_table) > > >> <OPT_ftree_ter>: Set to 0. > > >> gcc/testsuite/ > > >> * gcc.target/avr/torture/pr87376-memx.c: New test.