On Mon, May 11, 2026 at 09:40:15PM +0200, Stefan Schulze Frielinghaus wrote:
> On Fri, May 08, 2026 at 08:23:46AM -0400, Vladimir Makarov wrote:
> >
> > On 5/6/26 5:21 AM, Stefan Schulze Frielinghaus wrote:
> > > From: Stefan Schulze Frielinghaus <[email protected]>
> > >
> > > Currently an "entire" address is reloaded even in cases where section
> > > anchors are involved. This makes it harder to share section anchors
> > > which is the whole point of them. For example, in cases where
> > > offsetable MEMs are ok do not reload .LANCHOR42+offset but only
> > > .LANCHOR42 and replace the address with the resulting reload register
> > > and the offset. As a consequence subsequent passes only have to deal
> > > with register equivalences in order to share section anchors. For
> > > example
> >
> >
> > I thought how to fix this in another place as LRA is already too
> > complicated. It could be fixed in some machined-dependent pass or in
> > split1. Adding the pass is overkill and fix in split1 would be ok if the
> > target would work with memory via few insns (e.g. only via load/store
> > insns). So probably LRA is the best place to fix this problem.
> >
> >
> > > double x;
> > > double y;
> > >
> > > double foo ()
> > > {
> > > return x + y;
> > > }
> > >
> > > With this patch, after LRA we have
> > >
> > > 20: %r1:DI=`*.LANCHOR0'
> > > 17: %f0:DF=[%r1:DI]
> > > 19: %r1:DI=`*.LANCHOR0'
> > > 12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;}
> > >
> > > and after postreload
> > >
> > > 20: %r1:DI=`*.LANCHOR0'
> > > 17: %f0:DF=[%r1:DI]
> > > 12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;}
> > >
> > > Of course, this was a lucky case since both reloads referred to the very
> > > same register which allows for trivial removal of insn 19. At least in
> > > cases like demonstrated by the new test section-anchors-4.c we are
> > > guaranteed to re-use the reload for a single insn.
> > >
> > > Before testing this patch for multiple targets, I'm wondering whether
> > > there is even a way to re-use reloads during LRA across insns (like an
> > > equiv) such that we wouldn't depend on subsequent passes?
>
> Meanwhile successfully bootstrapped and regtested on
>
> - aarch64-unknown-linux-gnu
> - s390x-ibm-linux-gnu
> - x86_64-pc-linux-gnu
>
> For powerpc64le-unknown-linux-gnu there is a new test failure:
>
> FAIL: gcc.target/powerpc/pr94740.c (internal compiler error: in
> extract_constrain_insn, at recog.cc:2795)
>
> Previously the whole address was reloaded which means we ended up with:
>
> 18: %2:DI=const(`*.LANCHOR0'+0x4)
> 7: %3:SI=bswap([%2:DI])
>
> and with this patch we bail out during LRA for
>
> 18: %2:DI=`*.LANCHOR0'
> 7: %3:SI=bswap([%2:DI+0x4])
>
> with
>
> pr94740.c:11:1: error: insn does not satisfy its constraints:
> 11 | }
> | ^
> (insn 7 18 13 2 (set (reg:SI 3 3 [orig:119 _3 ] [119])
> (bswap:SI (mem/c:SI (plus:DI (reg:DI 2 2 [127])
> (const_int 4 [0x4])) [1 array[1]+0 S4 A32])))
> "pr94740.c":10:10 143 {bswapsi2_load}
> (nil))
>
> The insn definition is:
>
> (define_insn "bswap<mode>2_load"
> [(set (match_operand:HSI 0 "gpc_reg_operand" "=r")
> (bswap:HSI (match_operand:HSI 1 "memory_operand" "Z")))]
> ""
> "l<wd>brx %0,%y1"
> [(set_attr "type" "load")])
>
> and the important part of the constraint Z is
>
> (define_special_predicate "indexed_or_indirect_address"
> (and (match_test "REG_P (op)
> || (GET_CODE (op) == PLUS
> /* Omit testing REG_P (XEXP (op, 0)). */
> && REG_P (XEXP (op, 1)))")
> (match_operand 0 "address_operand")))
>
> which doesn't accept offsettable addresses. At this point I'm not
> entirely sure what the contract between targets and LRA is. My reading
> so far was that due to goal_alt_offmemok[i] it is safe to use
> offsettable addresses.
goal_alt_offmemok[i] is derived from offmemok which is computed in
process_alt_operands() as e.g. here:
case CT_MEMORY:
case CT_RELAXED_MEMORY:
if (MEM_P (op)
&& satisfies_memory_constraint_p (op, cn))
win = true;
else if (spilled_pseudo_p (op))
win = true;
/* If we didn't already win, we can reload constants
via force_const_mem or put the pseudo value into
memory, or make other memory by reloading the
address like for 'o'. */
if (CONST_POOL_OK_P (mode, op)
|| MEM_P (op) || REG_P (op)
/* We can restore the equiv insn by a
reload. */
|| equiv_substition_p[nop])
badop = false;
constmemok = true;
offmemok = true;
break;
Here offmemok is set without explicitly checking whether the constraint
accepts offsettable memory or not. So far this was no problem because
inside of
if (goal_alt_matched[i][0] == -1 && goal_alt_offmemok[i] && MEM_P (op))
every resulting address was a single register. I'm a bit puzzled.
Either goal_alt_offmemok[i] doesn't mean that offsettable addresses are
ok, or another check is required before setting offmemok to true. Any
thoughts?
Cheers,
Stefan
>
> >
> > LRA reuses the reload pseudo generated for one insn (please see usage of
> > curr_insn_input_reloads). The problem in not reusing reload pseudo for the
> > above example is that because reload for anchor occurs from different
> > insns. First LRA reloads [*.LANCHOR0] (insn 17 generated to satisfy reg
> > constraint in insn 12), then the anchor (from insn 17 ), and then reload
> > *.LANCHOR0 from 2nd op of insn 12. But I'd not be worry that the reload
> > pseudos get different hard regs. Knowing how assigning hard regs works in
> > LRA I see very small probability that the pseduos get different regs.
> >
> > BTW I did not reproduce the testcase situation w/o -march options (the
> > anchor in this case already in a pseudo before RA). But -march=z13
> > reproduces it. So probably you need to add this option to the test.
>
> Ahh right, I tested this patch on top of a private one where the test
> case is successful for many archs. Currently it is failing because
> the last alternative of insn cmpdi_cct is rejected for vanilla and
> accepted for my private patch. It will take me some time to polish up
> the private patch. Thus, I think we have two options here. Proceed
> with this patch without a test case, or wait until the other patch is
> ready. I'm fine either way.
>
> I was afraid of a flaky test and deliberately chose this test since it
> guarantees that the section anchor ends up in the very same reload
> register. Maybe I should be more brave here ;-)
>
> Thanks,
> Stefan
>
> >
> > So the patch (with -march=z13 or other one reproducing the situation as
> > additional option for the test) is ok for me. Thank you.
> >
> > > ---
> > > gcc/lra-constraints.cc | 30 +++++++++++++++++++
> > > .../gcc.target/s390/section-anchors-4.c | 25 ++++++++++++++++
> > > 2 files changed, 55 insertions(+)
> > > create mode 100644 gcc/testsuite/gcc.target/s390/section-anchors-4.c
> > >
> > > diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
> > > index ccd68efc956..6779dfee020 100644
> > > --- a/gcc/lra-constraints.cc
> > > +++ b/gcc/lra-constraints.cc
> > > @@ -4839,6 +4839,36 @@ curr_insn_transform (bool check_only_p)
> > > new_reg = emit_inc (rclass, *loc,
> > > /* This value does not matter for
> > > MODIFY. */
> > > GET_MODE_SIZE (GET_MODE (op)));
> > > + /* Try to pull out section anchors. For example, instead of
> > > + reloading an "entire" address like .LANCHOR42+offset only reload
> > > + .LANCHOR42 and use the new reload register as the base register.
> > > + This allows following optimizations to share section anchors and
> > > + remove redundant loads. */
> > > + else if (GET_CODE (*loc) == CONST
> > > + && GET_CODE (XEXP (*loc, 0)) == PLUS
> > > + && GET_CODE (XEXP (XEXP (*loc, 0), 0)) == SYMBOL_REF
> > > + && SYMBOL_REF_ANCHOR_P (XEXP (XEXP (*loc, 0), 0))
> > > + && CONST_INT_P (XEXP (XEXP (*loc, 0), 1))
> > > + /* Some offsets are valid in conjunction with a symbol and
> > > + invalid in conjunction with a register. Thus, pull out
> > > + the anchor only in case the offset is a valid anchor
> > > + offset. */
> > > + && INTVAL (XEXP (XEXP (*loc, 0), 1))
> > > + >= targetm.min_anchor_offset
> > > + && INTVAL (XEXP (XEXP (*loc, 0), 1))
> > > + <= targetm.max_anchor_offset)
> > > + {
> > > + rtx anchor = XEXP (XEXP (*loc, 0), 0);
> > > + rtx offset = XEXP (XEXP (*loc, 0), 1);
> > > +
> > > + if (get_reload_reg (OP_IN, Pmode, anchor, rclass,
> > > + NULL, false, false,
> > > + "offsetable address", &new_reg))
> > > + lra_emit_move (new_reg, anchor);
> > > +
> > > + new_reg = gen_rtx_PLUS (Pmode, new_reg, offset);
> > > + lra_assert (valid_address_p (Pmode, new_reg, MEM_ADDR_SPACE
> > > (op)));
> > > + }
> > > else if (get_reload_reg (OP_IN, Pmode, *loc, rclass,
> > > NULL, false, false,
> > > "offsetable address", &new_reg))
> > > diff --git a/gcc/testsuite/gcc.target/s390/section-anchors-4.c
> > > b/gcc/testsuite/gcc.target/s390/section-anchors-4.c
> > > new file mode 100644
> > > index 00000000000..0b4cd081c61
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/s390/section-anchors-4.c
> > > @@ -0,0 +1,25 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-rtl-ira-slim -fdump-rtl-reload-slim" } */
> > > +/* { dg-final { scan-assembler-times "\tlarl\t" 1 } } */
> > > +/* { dg-final { scan-rtl-dump
> > > "%cc:CCZ=cmp\\\(\\\[`\\\*.LANCHOR0'\\\],\\\[const\\\(`\\\*.LANCHOR0'\\\+0x8\\\)\\\]\\\)"
> > > "ira" } } */
> > > +/* { dg-final { scan-rtl-dump
> > > "%cc:CCZ=cmp\\\(\\\[(%r\[1-9\]\[0-9\]?):DI\\\],\\\[\\1:DI\\\+0x8\\\]\\\)"
> > > "reload" } } */
> > > +
> > > +/* Ensure that we get the same reload register for the second memory
> > > operand.
> > > + Prior LRA we have
> > > +
> > > + 55: %cc:CCZ=cmp([`*.LANCHOR0'],[const(`*.LANCHOR0'+0x8)])
> > > +
> > > + and after LRA
> > > +
> > > + 59: %r1:DI=`*.LANCHOR0'
> > > + 55: %cc:CCZ=cmp([%r1:DI],[%r1:DI+0x8]) */
> > > +
> > > +long x, y;
> > > +
> > > +long
> > > +foo (void)
> > > +{
> > > + if (x == y)
> > > + return 42;
> > > + return 0;
> > > +}
> >