================
@@ -1248,18 +1249,62 @@ void InlineSpiller::spillAroundUses(Register Reg) {
// Create a new virtual register for spill/fill.
// FIXME: Infer regclass from instruction alone.
- Register NewVReg = Edit->createFrom(Reg);
+
+ unsigned SubReg = 0;
+ LaneBitmask CoveringLanes = LaneBitmask::getNone();
+ // If the subreg liveness is enabled, identify the subreg use(s) to try
+ // subreg reload. Skip if the instruction also defines the register.
+ // For copy bundles, get the covering lane masks.
+ if (MRI.subRegLivenessEnabled() && !RI.Writes) {
+ for (auto [MI, OpIdx] : Ops) {
+ const MachineOperand &MO = MI->getOperand(OpIdx);
+ assert(MO.isReg() && MO.getReg() == Reg);
+ if (MO.isUse()) {
+ SubReg = MO.getSubReg();
+ if (SubReg)
+ CoveringLanes |= TRI.getSubRegIndexLaneMask(SubReg);
+ }
+ }
+ }
+
+ if (MI.isBundled() && CoveringLanes.any()) {
+ CoveringLanes = LaneBitmask(bit_ceil(CoveringLanes.getAsInteger()) - 1);
+ // Obtain the covering subregister index, including any missing indices
+ // within the identified small range. Although this may be suboptimal due
+ // to gaps in the subregisters that are not part of the copy bundle, it
is
+ // benificial when components outside this range of the original tuple
can
+ // be completely skipped from the reload.
+ SubReg = TRI.getSubRegIdxFromLaneMask(CoveringLanes);
+ }
+
+ // If the target doesn't support subreg reload, fallback to restoring the
+ // full tuple.
+ if (SubReg && !TRI.shouldEnableSubRegReload(SubReg))
+ SubReg = 0;
+
+ const TargetRegisterClass *OrigRC = MRI.getRegClass(Reg);
+ const TargetRegisterClass *NewRC =
+ SubReg ? TRI.getSubRegisterClass(OrigRC, SubReg) : nullptr;
----------------
cdevadas wrote:
The subreg reload brings two advantages.
1. Currently, when a tuple is reloaded, the full tuple becomes live at the
reload point, even if only a subset of its components is actually needed. On
targets like AMDGPU, this creates difficulties later during the expansion of
the reload pseudo-instruction into individual reload operations, because the
unused or undefined subregisters still appear live. They are often patched with
ad hoc fixups such as inserting implicit-def or implicit operands for the
unneeded tuple components to avoid miscompilations. The subreg reload fixes
this broken liveness info for partial uses of tuples chosen for spilling. It
avoids introducing spurious undef subregs and eliminates the need for such
hacky post-RA workarounds.
2. Trimming down the registers really helps improve the allocation. Instead of
the full tuple, we ensure RA reloads only the relevant subregs.
It is not clear to me how RA will see `%reload = INSERT_SUBREG undef, ..` (the
one you suggested). We may miss the two advantages I mentioned here.
https://github.com/llvm/llvm-project/pull/175002
_______________________________________________
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits