On Mon, May 11, 2026 at 09:40:15PM +0200, Stefan Schulze Frielinghaus wrote:
> On Fri, May 08, 2026 at 08:23:46AM -0400, Vladimir Makarov wrote:
> > 
> > On 5/6/26 5:21 AM, Stefan Schulze Frielinghaus wrote:
> > > From: Stefan Schulze Frielinghaus <[email protected]>
> > > 
> > > Currently an "entire" address is reloaded even in cases where section
> > > anchors are involved.  This makes it harder to share section anchors
> > > which is the whole point of them.  For example, in cases where
> > > offsetable MEMs are ok do not reload .LANCHOR42+offset but only
> > > .LANCHOR42 and replace the address with the resulting reload register
> > > and the offset.  As a consequence subsequent passes only have to deal
> > > with register equivalences in order to share section anchors.  For
> > > example
> > 
> > 
> > I thought how to fix this in another place as LRA is already too
> > complicated.  It could be fixed in some machined-dependent pass or in
> > split1.  Adding the pass is overkill and fix in split1 would be ok if the
> > target would work with memory via few insns (e.g. only via load/store
> > insns).  So probably LRA is the best place to fix this problem.
> > 
> > 
> > > double x;
> > > double y;
> > > 
> > > double foo ()
> > > {
> > >    return x + y;
> > > }
> > > 
> > > With this patch, after LRA we have
> > > 
> > >     20: %r1:DI=`*.LANCHOR0'
> > >     17: %f0:DF=[%r1:DI]
> > >     19: %r1:DI=`*.LANCHOR0'
> > >     12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;}
> > > 
> > > and after postreload
> > > 
> > >     20: %r1:DI=`*.LANCHOR0'
> > >     17: %f0:DF=[%r1:DI]
> > >     12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;}
> > > 
> > > Of course, this was a lucky case since both reloads referred to the very
> > > same register which allows for trivial removal of insn 19.  At least in
> > > cases like demonstrated by the new test section-anchors-4.c we are
> > > guaranteed to re-use the reload for a single insn.
> > > 
> > > Before testing this patch for multiple targets, I'm wondering whether
> > > there is even a way to re-use reloads during LRA across insns (like an
> > > equiv) such that we wouldn't depend on subsequent passes?
> 
> Meanwhile successfully bootstrapped and regtested on
> 
> - aarch64-unknown-linux-gnu
> - s390x-ibm-linux-gnu
> - x86_64-pc-linux-gnu
> 
> For powerpc64le-unknown-linux-gnu there is a new test failure:
> 
> FAIL: gcc.target/powerpc/pr94740.c (internal compiler error: in 
> extract_constrain_insn, at recog.cc:2795)
> 
> Previously the whole address was reloaded which means we ended up with:
> 
>    18: %2:DI=const(`*.LANCHOR0'+0x4)
>     7: %3:SI=bswap([%2:DI])
> 
> and with this patch we bail out during LRA for
> 
>    18: %2:DI=`*.LANCHOR0'
>     7: %3:SI=bswap([%2:DI+0x4])
> 
> with
> 
> pr94740.c:11:1: error: insn does not satisfy its constraints:
>    11 | }
>       | ^
> (insn 7 18 13 2 (set (reg:SI 3 3 [orig:119 _3 ] [119])
>         (bswap:SI (mem/c:SI (plus:DI (reg:DI 2 2 [127])
>                     (const_int 4 [0x4])) [1 array[1]+0 S4 A32]))) 
> "pr94740.c":10:10 143 {bswapsi2_load}
>      (nil))
> 
> The insn definition is:
> 
> (define_insn "bswap<mode>2_load"
>   [(set (match_operand:HSI 0 "gpc_reg_operand" "=r")
>         (bswap:HSI (match_operand:HSI 1 "memory_operand" "Z")))]
>   ""
>   "l<wd>brx %0,%y1"
>   [(set_attr "type" "load")])
> 
> and the important part of the constraint Z is
> 
> (define_special_predicate "indexed_or_indirect_address"
>   (and (match_test "REG_P (op)
>                     || (GET_CODE (op) == PLUS
>                         /* Omit testing REG_P (XEXP (op, 0)).  */
>                         && REG_P (XEXP (op, 1)))")
>        (match_operand 0 "address_operand")))
> 
> which doesn't accept offsettable addresses.  At this point I'm not
> entirely sure what the contract between targets and LRA is.  My reading
> so far was that due to goal_alt_offmemok[i] it is safe to use
> offsettable addresses.

goal_alt_offmemok[i] is derived from offmemok which is computed in
process_alt_operands() as e.g. here:

        case CT_MEMORY:
        case CT_RELAXED_MEMORY:
          if (MEM_P (op)
              && satisfies_memory_constraint_p (op, cn))
            win = true;
          else if (spilled_pseudo_p (op))
            win = true;

          /* If we didn't already win, we can reload constants
             via force_const_mem or put the pseudo value into
             memory, or make other memory by reloading the
             address like for 'o'.  */
          if (CONST_POOL_OK_P (mode, op)
              || MEM_P (op) || REG_P (op)
              /* We can restore the equiv insn by a
                 reload.  */
              || equiv_substition_p[nop])
            badop = false;
          constmemok = true;
          offmemok = true;
          break;

Here offmemok is set without explicitly checking whether the constraint
accepts offsettable memory or not.  So far this was no problem because
inside of

  if (goal_alt_matched[i][0] == -1 && goal_alt_offmemok[i] && MEM_P (op))

every resulting address was a single register.  I'm a bit puzzled.
Either goal_alt_offmemok[i] doesn't mean that offsettable addresses are
ok, or another check is required before setting offmemok to true.  Any
thoughts?

Cheers,
Stefan


> 
> > 
> > LRA reuses the reload pseudo generated for one insn (please see usage of
> > curr_insn_input_reloads).  The problem in not reusing reload pseudo for the
> > above example is that because reload for anchor occurs from different
> > insns.  First LRA reloads [*.LANCHOR0] (insn 17 generated to satisfy reg
> > constraint in insn 12), then the anchor (from insn 17 ), and then reload
> > *.LANCHOR0 from 2nd op of insn 12.  But I'd not be worry that the reload
> > pseudos get different hard regs.  Knowing how assigning hard regs works in
> > LRA I see very small probability that the pseduos get different regs.
> > 
> > BTW I did not reproduce the testcase situation w/o -march options (the
> > anchor in this case already in a pseudo before RA).  But -march=z13
> > reproduces it.  So probably you need to add this option to the test.
> 
> Ahh right, I tested this patch on top of a private one where the test
> case is successful for many archs.  Currently it is failing because
> the last alternative of insn cmpdi_cct is rejected for vanilla and
> accepted for my private patch.  It will take me some time to polish up
> the private patch.  Thus, I think we have two options here.  Proceed
> with this patch without a test case, or wait until the other patch is
> ready.  I'm fine either way.
> 
> I was afraid of a flaky test and deliberately chose this test since it
> guarantees that the section anchor ends up in the very same reload
> register.  Maybe I should be more brave here ;-)
> 
> Thanks,
> Stefan
> 
> > 
> > So the patch (with -march=z13 or other one reproducing the situation as
> > additional option for the test) is ok for me. Thank you.
> > 
> > > ---
> > >   gcc/lra-constraints.cc                        | 30 +++++++++++++++++++
> > >   .../gcc.target/s390/section-anchors-4.c       | 25 ++++++++++++++++
> > >   2 files changed, 55 insertions(+)
> > >   create mode 100644 gcc/testsuite/gcc.target/s390/section-anchors-4.c
> > > 
> > > diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
> > > index ccd68efc956..6779dfee020 100644
> > > --- a/gcc/lra-constraints.cc
> > > +++ b/gcc/lra-constraints.cc
> > > @@ -4839,6 +4839,36 @@ curr_insn_transform (bool check_only_p)
> > >               new_reg = emit_inc (rclass, *loc,
> > >                                   /* This value does not matter for 
> > > MODIFY.  */
> > >                                   GET_MODE_SIZE (GET_MODE (op)));
> > > +   /* Try to pull out section anchors.  For example, instead of
> > > +      reloading an "entire" address like .LANCHOR42+offset only reload
> > > +      .LANCHOR42 and use the new reload register as the base register.
> > > +      This allows following optimizations to share section anchors and
> > > +      remove redundant loads.  */
> > > +   else if (GET_CODE (*loc) == CONST
> > > +            && GET_CODE (XEXP (*loc, 0)) == PLUS
> > > +            && GET_CODE (XEXP (XEXP (*loc, 0), 0)) == SYMBOL_REF
> > > +            && SYMBOL_REF_ANCHOR_P (XEXP (XEXP (*loc, 0), 0))
> > > +            && CONST_INT_P (XEXP (XEXP (*loc, 0), 1))
> > > +            /* Some offsets are valid in conjunction with a symbol and
> > > +               invalid in conjunction with a register.  Thus, pull out
> > > +               the anchor only in case the offset is a valid anchor
> > > +               offset.  */
> > > +            && INTVAL (XEXP (XEXP (*loc, 0), 1))
> > > +               >= targetm.min_anchor_offset
> > > +            && INTVAL (XEXP (XEXP (*loc, 0), 1))
> > > +               <= targetm.max_anchor_offset)
> > > +     {
> > > +        rtx anchor = XEXP (XEXP (*loc, 0), 0);
> > > +        rtx offset = XEXP (XEXP (*loc, 0), 1);
> > > +
> > > +        if (get_reload_reg (OP_IN, Pmode, anchor, rclass,
> > > +                            NULL, false, false,
> > > +                            "offsetable address", &new_reg))
> > > +           lra_emit_move (new_reg, anchor);
> > > +
> > > +         new_reg = gen_rtx_PLUS (Pmode, new_reg, offset);
> > > +         lra_assert (valid_address_p (Pmode, new_reg, MEM_ADDR_SPACE 
> > > (op)));
> > > +     }
> > >             else if (get_reload_reg (OP_IN, Pmode, *loc, rclass,
> > >                                      NULL, false, false,
> > >                                      "offsetable address", &new_reg))
> > > diff --git a/gcc/testsuite/gcc.target/s390/section-anchors-4.c 
> > > b/gcc/testsuite/gcc.target/s390/section-anchors-4.c
> > > new file mode 100644
> > > index 00000000000..0b4cd081c61
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/s390/section-anchors-4.c
> > > @@ -0,0 +1,25 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-rtl-ira-slim -fdump-rtl-reload-slim" } */
> > > +/* { dg-final { scan-assembler-times "\tlarl\t" 1 } } */
> > > +/* { dg-final { scan-rtl-dump 
> > > "%cc:CCZ=cmp\\\(\\\[`\\\*.LANCHOR0'\\\],\\\[const\\\(`\\\*.LANCHOR0'\\\+0x8\\\)\\\]\\\)"
> > >  "ira" } } */
> > > +/* { dg-final { scan-rtl-dump 
> > > "%cc:CCZ=cmp\\\(\\\[(%r\[1-9\]\[0-9\]?):DI\\\],\\\[\\1:DI\\\+0x8\\\]\\\)" 
> > > "reload" } } */
> > > +
> > > +/* Ensure that we get the same reload register for the second memory 
> > > operand.
> > > +   Prior LRA we have
> > > +
> > > +   55: %cc:CCZ=cmp([`*.LANCHOR0'],[const(`*.LANCHOR0'+0x8)])
> > > +
> > > +   and after LRA
> > > +
> > > +   59: %r1:DI=`*.LANCHOR0'
> > > +   55: %cc:CCZ=cmp([%r1:DI],[%r1:DI+0x8])  */
> > > +
> > > +long x, y;
> > > +
> > > +long
> > > +foo (void)
> > > +{
> > > +  if (x == y)
> > > +    return 42;
> > > +  return 0;
> > > +}
> > 

Reply via email to