On 5/6/26 5:21 AM, Stefan Schulze Frielinghaus wrote:
From: Stefan Schulze Frielinghaus <[email protected]>
Currently an "entire" address is reloaded even in cases where section
anchors are involved. This makes it harder to share section anchors
which is the whole point of them. For example, in cases where
offsetable MEMs are ok do not reload .LANCHOR42+offset but only
.LANCHOR42 and replace the address with the resulting reload register
and the offset. As a consequence subsequent passes only have to deal
with register equivalences in order to share section anchors. For
example
I thought how to fix this in another place as LRA is already too
complicated. It could be fixed in some machined-dependent pass or in
split1. Adding the pass is overkill and fix in split1 would be ok if
the target would work with memory via few insns (e.g. only via
load/store insns). So probably LRA is the best place to fix this problem.
double x;
double y;
double foo ()
{
return x + y;
}
With this patch, after LRA we have
20: %r1:DI=`*.LANCHOR0'
17: %f0:DF=[%r1:DI]
19: %r1:DI=`*.LANCHOR0'
12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;}
and after postreload
20: %r1:DI=`*.LANCHOR0'
17: %f0:DF=[%r1:DI]
12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;}
Of course, this was a lucky case since both reloads referred to the very
same register which allows for trivial removal of insn 19. At least in
cases like demonstrated by the new test section-anchors-4.c we are
guaranteed to re-use the reload for a single insn.
Before testing this patch for multiple targets, I'm wondering whether
there is even a way to re-use reloads during LRA across insns (like an
equiv) such that we wouldn't depend on subsequent passes?
LRA reuses the reload pseudo generated for one insn (please see usage of
curr_insn_input_reloads). The problem in not reusing reload pseudo for
the above example is that because reload for anchor occurs from
different insns. First LRA reloads [*.LANCHOR0] (insn 17 generated to
satisfy reg constraint in insn 12), then the anchor (from insn 17 ), and
then reload *.LANCHOR0 from 2nd op of insn 12. But I'd not be worry
that the reload pseudos get different hard regs. Knowing how assigning
hard regs works in LRA I see very small probability that the pseduos get
different regs.
BTW I did not reproduce the testcase situation w/o -march options (the
anchor in this case already in a pseudo before RA). But -march=z13
reproduces it. So probably you need to add this option to the test.
So the patch (with -march=z13 or other one reproducing the situation as
additional option for the test) is ok for me. Thank you.
---
gcc/lra-constraints.cc | 30 +++++++++++++++++++
.../gcc.target/s390/section-anchors-4.c | 25 ++++++++++++++++
2 files changed, 55 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/s390/section-anchors-4.c
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index ccd68efc956..6779dfee020 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -4839,6 +4839,36 @@ curr_insn_transform (bool check_only_p)
new_reg = emit_inc (rclass, *loc,
/* This value does not matter for MODIFY. */
GET_MODE_SIZE (GET_MODE (op)));
+ /* Try to pull out section anchors. For example, instead of
+ reloading an "entire" address like .LANCHOR42+offset only reload
+ .LANCHOR42 and use the new reload register as the base register.
+ This allows following optimizations to share section anchors and
+ remove redundant loads. */
+ else if (GET_CODE (*loc) == CONST
+ && GET_CODE (XEXP (*loc, 0)) == PLUS
+ && GET_CODE (XEXP (XEXP (*loc, 0), 0)) == SYMBOL_REF
+ && SYMBOL_REF_ANCHOR_P (XEXP (XEXP (*loc, 0), 0))
+ && CONST_INT_P (XEXP (XEXP (*loc, 0), 1))
+ /* Some offsets are valid in conjunction with a symbol and
+ invalid in conjunction with a register. Thus, pull out
+ the anchor only in case the offset is a valid anchor
+ offset. */
+ && INTVAL (XEXP (XEXP (*loc, 0), 1))
+ >= targetm.min_anchor_offset
+ && INTVAL (XEXP (XEXP (*loc, 0), 1))
+ <= targetm.max_anchor_offset)
+ {
+ rtx anchor = XEXP (XEXP (*loc, 0), 0);
+ rtx offset = XEXP (XEXP (*loc, 0), 1);
+
+ if (get_reload_reg (OP_IN, Pmode, anchor, rclass,
+ NULL, false, false,
+ "offsetable address", &new_reg))
+ lra_emit_move (new_reg, anchor);
+
+ new_reg = gen_rtx_PLUS (Pmode, new_reg, offset);
+ lra_assert (valid_address_p (Pmode, new_reg, MEM_ADDR_SPACE
(op)));
+ }
else if (get_reload_reg (OP_IN, Pmode, *loc, rclass,
NULL, false, false,
"offsetable address", &new_reg))
diff --git a/gcc/testsuite/gcc.target/s390/section-anchors-4.c
b/gcc/testsuite/gcc.target/s390/section-anchors-4.c
new file mode 100644
index 00000000000..0b4cd081c61
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/section-anchors-4.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ira-slim -fdump-rtl-reload-slim" } */
+/* { dg-final { scan-assembler-times "\tlarl\t" 1 } } */
+/* { dg-final { scan-rtl-dump
"%cc:CCZ=cmp\\\(\\\[`\\\*.LANCHOR0'\\\],\\\[const\\\(`\\\*.LANCHOR0'\\\+0x8\\\)\\\]\\\)"
"ira" } } */
+/* { dg-final { scan-rtl-dump
"%cc:CCZ=cmp\\\(\\\[(%r\[1-9\]\[0-9\]?):DI\\\],\\\[\\1:DI\\\+0x8\\\]\\\)"
"reload" } } */
+
+/* Ensure that we get the same reload register for the second memory operand.
+ Prior LRA we have
+
+ 55: %cc:CCZ=cmp([`*.LANCHOR0'],[const(`*.LANCHOR0'+0x8)])
+
+ and after LRA
+
+ 59: %r1:DI=`*.LANCHOR0'
+ 55: %cc:CCZ=cmp([%r1:DI],[%r1:DI+0x8]) */
+
+long x, y;
+
+long
+foo (void)
+{
+ if (x == y)
+ return 42;
+ return 0;
+}