On Fri, May 08, 2026 at 08:23:46AM -0400, Vladimir Makarov wrote:
>
> On 5/6/26 5:21 AM, Stefan Schulze Frielinghaus wrote:
> > From: Stefan Schulze Frielinghaus <[email protected]>
> >
> > Currently an "entire" address is reloaded even in cases where section
> > anchors are involved. This makes it harder to share section anchors
> > which is the whole point of them. For example, in cases where
> > offsetable MEMs are ok do not reload .LANCHOR42+offset but only
> > .LANCHOR42 and replace the address with the resulting reload register
> > and the offset. As a consequence subsequent passes only have to deal
> > with register equivalences in order to share section anchors. For
> > example
>
>
> I thought how to fix this in another place as LRA is already too
> complicated. It could be fixed in some machined-dependent pass or in
> split1. Adding the pass is overkill and fix in split1 would be ok if the
> target would work with memory via few insns (e.g. only via load/store
> insns). So probably LRA is the best place to fix this problem.
>
>
> > double x;
> > double y;
> >
> > double foo ()
> > {
> > return x + y;
> > }
> >
> > With this patch, after LRA we have
> >
> > 20: %r1:DI=`*.LANCHOR0'
> > 17: %f0:DF=[%r1:DI]
> > 19: %r1:DI=`*.LANCHOR0'
> > 12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;}
> >
> > and after postreload
> >
> > 20: %r1:DI=`*.LANCHOR0'
> > 17: %f0:DF=[%r1:DI]
> > 12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;}
> >
> > Of course, this was a lucky case since both reloads referred to the very
> > same register which allows for trivial removal of insn 19. At least in
> > cases like demonstrated by the new test section-anchors-4.c we are
> > guaranteed to re-use the reload for a single insn.
> >
> > Before testing this patch for multiple targets, I'm wondering whether
> > there is even a way to re-use reloads during LRA across insns (like an
> > equiv) such that we wouldn't depend on subsequent passes?
Meanwhile successfully bootstrapped and regtested on
- aarch64-unknown-linux-gnu
- s390x-ibm-linux-gnu
- x86_64-pc-linux-gnu
For powerpc64le-unknown-linux-gnu there is a new test failure:
FAIL: gcc.target/powerpc/pr94740.c (internal compiler error: in
extract_constrain_insn, at recog.cc:2795)
Previously the whole address was reloaded which means we ended up with:
18: %2:DI=const(`*.LANCHOR0'+0x4)
7: %3:SI=bswap([%2:DI])
and with this patch we bail out during LRA for
18: %2:DI=`*.LANCHOR0'
7: %3:SI=bswap([%2:DI+0x4])
with
pr94740.c:11:1: error: insn does not satisfy its constraints:
11 | }
| ^
(insn 7 18 13 2 (set (reg:SI 3 3 [orig:119 _3 ] [119])
(bswap:SI (mem/c:SI (plus:DI (reg:DI 2 2 [127])
(const_int 4 [0x4])) [1 array[1]+0 S4 A32])))
"pr94740.c":10:10 143 {bswapsi2_load}
(nil))
The insn definition is:
(define_insn "bswap<mode>2_load"
[(set (match_operand:HSI 0 "gpc_reg_operand" "=r")
(bswap:HSI (match_operand:HSI 1 "memory_operand" "Z")))]
""
"l<wd>brx %0,%y1"
[(set_attr "type" "load")])
and the important part of the constraint Z is
(define_special_predicate "indexed_or_indirect_address"
(and (match_test "REG_P (op)
|| (GET_CODE (op) == PLUS
/* Omit testing REG_P (XEXP (op, 0)). */
&& REG_P (XEXP (op, 1)))")
(match_operand 0 "address_operand")))
which doesn't accept offsettable addresses. At this point I'm not
entirely sure what the contract between targets and LRA is. My reading
so far was that due to goal_alt_offmemok[i] it is safe to use
offsettable addresses.
>
> LRA reuses the reload pseudo generated for one insn (please see usage of
> curr_insn_input_reloads). The problem in not reusing reload pseudo for the
> above example is that because reload for anchor occurs from different
> insns. First LRA reloads [*.LANCHOR0] (insn 17 generated to satisfy reg
> constraint in insn 12), then the anchor (from insn 17 ), and then reload
> *.LANCHOR0 from 2nd op of insn 12. But I'd not be worry that the reload
> pseudos get different hard regs. Knowing how assigning hard regs works in
> LRA I see very small probability that the pseduos get different regs.
>
> BTW I did not reproduce the testcase situation w/o -march options (the
> anchor in this case already in a pseudo before RA). But -march=z13
> reproduces it. So probably you need to add this option to the test.
Ahh right, I tested this patch on top of a private one where the test
case is successful for many archs. Currently it is failing because
the last alternative of insn cmpdi_cct is rejected for vanilla and
accepted for my private patch. It will take me some time to polish up
the private patch. Thus, I think we have two options here. Proceed
with this patch without a test case, or wait until the other patch is
ready. I'm fine either way.
I was afraid of a flaky test and deliberately chose this test since it
guarantees that the section anchor ends up in the very same reload
register. Maybe I should be more brave here ;-)
Thanks,
Stefan
>
> So the patch (with -march=z13 or other one reproducing the situation as
> additional option for the test) is ok for me. Thank you.
>
> > ---
> > gcc/lra-constraints.cc | 30 +++++++++++++++++++
> > .../gcc.target/s390/section-anchors-4.c | 25 ++++++++++++++++
> > 2 files changed, 55 insertions(+)
> > create mode 100644 gcc/testsuite/gcc.target/s390/section-anchors-4.c
> >
> > diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
> > index ccd68efc956..6779dfee020 100644
> > --- a/gcc/lra-constraints.cc
> > +++ b/gcc/lra-constraints.cc
> > @@ -4839,6 +4839,36 @@ curr_insn_transform (bool check_only_p)
> > new_reg = emit_inc (rclass, *loc,
> > /* This value does not matter for MODIFY. */
> > GET_MODE_SIZE (GET_MODE (op)));
> > + /* Try to pull out section anchors. For example, instead of
> > + reloading an "entire" address like .LANCHOR42+offset only reload
> > + .LANCHOR42 and use the new reload register as the base register.
> > + This allows following optimizations to share section anchors and
> > + remove redundant loads. */
> > + else if (GET_CODE (*loc) == CONST
> > + && GET_CODE (XEXP (*loc, 0)) == PLUS
> > + && GET_CODE (XEXP (XEXP (*loc, 0), 0)) == SYMBOL_REF
> > + && SYMBOL_REF_ANCHOR_P (XEXP (XEXP (*loc, 0), 0))
> > + && CONST_INT_P (XEXP (XEXP (*loc, 0), 1))
> > + /* Some offsets are valid in conjunction with a symbol and
> > + invalid in conjunction with a register. Thus, pull out
> > + the anchor only in case the offset is a valid anchor
> > + offset. */
> > + && INTVAL (XEXP (XEXP (*loc, 0), 1))
> > + >= targetm.min_anchor_offset
> > + && INTVAL (XEXP (XEXP (*loc, 0), 1))
> > + <= targetm.max_anchor_offset)
> > + {
> > + rtx anchor = XEXP (XEXP (*loc, 0), 0);
> > + rtx offset = XEXP (XEXP (*loc, 0), 1);
> > +
> > + if (get_reload_reg (OP_IN, Pmode, anchor, rclass,
> > + NULL, false, false,
> > + "offsetable address", &new_reg))
> > + lra_emit_move (new_reg, anchor);
> > +
> > + new_reg = gen_rtx_PLUS (Pmode, new_reg, offset);
> > + lra_assert (valid_address_p (Pmode, new_reg, MEM_ADDR_SPACE
> > (op)));
> > + }
> > else if (get_reload_reg (OP_IN, Pmode, *loc, rclass,
> > NULL, false, false,
> > "offsetable address", &new_reg))
> > diff --git a/gcc/testsuite/gcc.target/s390/section-anchors-4.c
> > b/gcc/testsuite/gcc.target/s390/section-anchors-4.c
> > new file mode 100644
> > index 00000000000..0b4cd081c61
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/s390/section-anchors-4.c
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-rtl-ira-slim -fdump-rtl-reload-slim" } */
> > +/* { dg-final { scan-assembler-times "\tlarl\t" 1 } } */
> > +/* { dg-final { scan-rtl-dump
> > "%cc:CCZ=cmp\\\(\\\[`\\\*.LANCHOR0'\\\],\\\[const\\\(`\\\*.LANCHOR0'\\\+0x8\\\)\\\]\\\)"
> > "ira" } } */
> > +/* { dg-final { scan-rtl-dump
> > "%cc:CCZ=cmp\\\(\\\[(%r\[1-9\]\[0-9\]?):DI\\\],\\\[\\1:DI\\\+0x8\\\]\\\)"
> > "reload" } } */
> > +
> > +/* Ensure that we get the same reload register for the second memory
> > operand.
> > + Prior LRA we have
> > +
> > + 55: %cc:CCZ=cmp([`*.LANCHOR0'],[const(`*.LANCHOR0'+0x8)])
> > +
> > + and after LRA
> > +
> > + 59: %r1:DI=`*.LANCHOR0'
> > + 55: %cc:CCZ=cmp([%r1:DI],[%r1:DI+0x8]) */
> > +
> > +long x, y;
> > +
> > +long
> > +foo (void)
> > +{
> > + if (x == y)
> > + return 42;
> > + return 0;
> > +}
>