On Fri, May 08, 2026 at 08:23:46AM -0400, Vladimir Makarov wrote:
> 
> On 5/6/26 5:21 AM, Stefan Schulze Frielinghaus wrote:
> > From: Stefan Schulze Frielinghaus <[email protected]>
> > 
> > Currently an "entire" address is reloaded even in cases where section
> > anchors are involved.  This makes it harder to share section anchors
> > which is the whole point of them.  For example, in cases where
> > offsetable MEMs are ok do not reload .LANCHOR42+offset but only
> > .LANCHOR42 and replace the address with the resulting reload register
> > and the offset.  As a consequence subsequent passes only have to deal
> > with register equivalences in order to share section anchors.  For
> > example
> 
> 
> I thought how to fix this in another place as LRA is already too
> complicated.  It could be fixed in some machined-dependent pass or in
> split1.  Adding the pass is overkill and fix in split1 would be ok if the
> target would work with memory via few insns (e.g. only via load/store
> insns).  So probably LRA is the best place to fix this problem.
> 
> 
> > double x;
> > double y;
> > 
> > double foo ()
> > {
> >    return x + y;
> > }
> > 
> > With this patch, after LRA we have
> > 
> >     20: %r1:DI=`*.LANCHOR0'
> >     17: %f0:DF=[%r1:DI]
> >     19: %r1:DI=`*.LANCHOR0'
> >     12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;}
> > 
> > and after postreload
> > 
> >     20: %r1:DI=`*.LANCHOR0'
> >     17: %f0:DF=[%r1:DI]
> >     12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;}
> > 
> > Of course, this was a lucky case since both reloads referred to the very
> > same register which allows for trivial removal of insn 19.  At least in
> > cases like demonstrated by the new test section-anchors-4.c we are
> > guaranteed to re-use the reload for a single insn.
> > 
> > Before testing this patch for multiple targets, I'm wondering whether
> > there is even a way to re-use reloads during LRA across insns (like an
> > equiv) such that we wouldn't depend on subsequent passes?

Meanwhile successfully bootstrapped and regtested on

- aarch64-unknown-linux-gnu
- s390x-ibm-linux-gnu
- x86_64-pc-linux-gnu

For powerpc64le-unknown-linux-gnu there is a new test failure:

FAIL: gcc.target/powerpc/pr94740.c (internal compiler error: in 
extract_constrain_insn, at recog.cc:2795)

Previously the whole address was reloaded which means we ended up with:

   18: %2:DI=const(`*.LANCHOR0'+0x4)
    7: %3:SI=bswap([%2:DI])

and with this patch we bail out during LRA for

   18: %2:DI=`*.LANCHOR0'
    7: %3:SI=bswap([%2:DI+0x4])

with

pr94740.c:11:1: error: insn does not satisfy its constraints:
   11 | }
      | ^
(insn 7 18 13 2 (set (reg:SI 3 3 [orig:119 _3 ] [119])
        (bswap:SI (mem/c:SI (plus:DI (reg:DI 2 2 [127])
                    (const_int 4 [0x4])) [1 array[1]+0 S4 A32]))) 
"pr94740.c":10:10 143 {bswapsi2_load}
     (nil))

The insn definition is:

(define_insn "bswap<mode>2_load"
  [(set (match_operand:HSI 0 "gpc_reg_operand" "=r")
        (bswap:HSI (match_operand:HSI 1 "memory_operand" "Z")))]
  ""
  "l<wd>brx %0,%y1"
  [(set_attr "type" "load")])

and the important part of the constraint Z is

(define_special_predicate "indexed_or_indirect_address"
  (and (match_test "REG_P (op)
                    || (GET_CODE (op) == PLUS
                        /* Omit testing REG_P (XEXP (op, 0)).  */
                        && REG_P (XEXP (op, 1)))")
       (match_operand 0 "address_operand")))

which doesn't accept offsettable addresses.  At this point I'm not
entirely sure what the contract between targets and LRA is.  My reading
so far was that due to goal_alt_offmemok[i] it is safe to use
offsettable addresses.

> 
> LRA reuses the reload pseudo generated for one insn (please see usage of
> curr_insn_input_reloads).  The problem in not reusing reload pseudo for the
> above example is that because reload for anchor occurs from different
> insns.  First LRA reloads [*.LANCHOR0] (insn 17 generated to satisfy reg
> constraint in insn 12), then the anchor (from insn 17 ), and then reload
> *.LANCHOR0 from 2nd op of insn 12.  But I'd not be worry that the reload
> pseudos get different hard regs.  Knowing how assigning hard regs works in
> LRA I see very small probability that the pseduos get different regs.
> 
> BTW I did not reproduce the testcase situation w/o -march options (the
> anchor in this case already in a pseudo before RA).  But -march=z13
> reproduces it.  So probably you need to add this option to the test.

Ahh right, I tested this patch on top of a private one where the test
case is successful for many archs.  Currently it is failing because
the last alternative of insn cmpdi_cct is rejected for vanilla and
accepted for my private patch.  It will take me some time to polish up
the private patch.  Thus, I think we have two options here.  Proceed
with this patch without a test case, or wait until the other patch is
ready.  I'm fine either way.

I was afraid of a flaky test and deliberately chose this test since it
guarantees that the section anchor ends up in the very same reload
register.  Maybe I should be more brave here ;-)

Thanks,
Stefan

> 
> So the patch (with -march=z13 or other one reproducing the situation as
> additional option for the test) is ok for me. Thank you.
> 
> > ---
> >   gcc/lra-constraints.cc                        | 30 +++++++++++++++++++
> >   .../gcc.target/s390/section-anchors-4.c       | 25 ++++++++++++++++
> >   2 files changed, 55 insertions(+)
> >   create mode 100644 gcc/testsuite/gcc.target/s390/section-anchors-4.c
> > 
> > diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
> > index ccd68efc956..6779dfee020 100644
> > --- a/gcc/lra-constraints.cc
> > +++ b/gcc/lra-constraints.cc
> > @@ -4839,6 +4839,36 @@ curr_insn_transform (bool check_only_p)
> >         new_reg = emit_inc (rclass, *loc,
> >                             /* This value does not matter for MODIFY.  */
> >                             GET_MODE_SIZE (GET_MODE (op)));
> > +     /* Try to pull out section anchors.  For example, instead of
> > +        reloading an "entire" address like .LANCHOR42+offset only reload
> > +        .LANCHOR42 and use the new reload register as the base register.
> > +        This allows following optimizations to share section anchors and
> > +        remove redundant loads.  */
> > +     else if (GET_CODE (*loc) == CONST
> > +              && GET_CODE (XEXP (*loc, 0)) == PLUS
> > +              && GET_CODE (XEXP (XEXP (*loc, 0), 0)) == SYMBOL_REF
> > +              && SYMBOL_REF_ANCHOR_P (XEXP (XEXP (*loc, 0), 0))
> > +              && CONST_INT_P (XEXP (XEXP (*loc, 0), 1))
> > +              /* Some offsets are valid in conjunction with a symbol and
> > +                 invalid in conjunction with a register.  Thus, pull out
> > +                 the anchor only in case the offset is a valid anchor
> > +                 offset.  */
> > +              && INTVAL (XEXP (XEXP (*loc, 0), 1))
> > +                 >= targetm.min_anchor_offset
> > +              && INTVAL (XEXP (XEXP (*loc, 0), 1))
> > +                 <= targetm.max_anchor_offset)
> > +       {
> > +          rtx anchor = XEXP (XEXP (*loc, 0), 0);
> > +          rtx offset = XEXP (XEXP (*loc, 0), 1);
> > +
> > +          if (get_reload_reg (OP_IN, Pmode, anchor, rclass,
> > +                              NULL, false, false,
> > +                              "offsetable address", &new_reg))
> > +             lra_emit_move (new_reg, anchor);
> > +
> > +           new_reg = gen_rtx_PLUS (Pmode, new_reg, offset);
> > +           lra_assert (valid_address_p (Pmode, new_reg, MEM_ADDR_SPACE 
> > (op)));
> > +       }
> >       else if (get_reload_reg (OP_IN, Pmode, *loc, rclass,
> >                                NULL, false, false,
> >                                "offsetable address", &new_reg))
> > diff --git a/gcc/testsuite/gcc.target/s390/section-anchors-4.c 
> > b/gcc/testsuite/gcc.target/s390/section-anchors-4.c
> > new file mode 100644
> > index 00000000000..0b4cd081c61
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/s390/section-anchors-4.c
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-rtl-ira-slim -fdump-rtl-reload-slim" } */
> > +/* { dg-final { scan-assembler-times "\tlarl\t" 1 } } */
> > +/* { dg-final { scan-rtl-dump 
> > "%cc:CCZ=cmp\\\(\\\[`\\\*.LANCHOR0'\\\],\\\[const\\\(`\\\*.LANCHOR0'\\\+0x8\\\)\\\]\\\)"
> >  "ira" } } */
> > +/* { dg-final { scan-rtl-dump 
> > "%cc:CCZ=cmp\\\(\\\[(%r\[1-9\]\[0-9\]?):DI\\\],\\\[\\1:DI\\\+0x8\\\]\\\)" 
> > "reload" } } */
> > +
> > +/* Ensure that we get the same reload register for the second memory 
> > operand.
> > +   Prior LRA we have
> > +
> > +   55: %cc:CCZ=cmp([`*.LANCHOR0'],[const(`*.LANCHOR0'+0x8)])
> > +
> > +   and after LRA
> > +
> > +   59: %r1:DI=`*.LANCHOR0'
> > +   55: %cc:CCZ=cmp([%r1:DI],[%r1:DI+0x8])  */
> > +
> > +long x, y;
> > +
> > +long
> > +foo (void)
> > +{
> > +  if (x == y)
> > +    return 42;
> > +  return 0;
> > +}
> 

Reply via email to