On Wed, Sep 25, 2019 at 5:48 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Ping > > Richard Sandiford <richard.sandif...@arm.com> writes: > > One of the effects of the function_abi series is to make -fipa-ra > > work for partially call-clobbered registers. E.g. if a call preserves > > only the low 32 bits of a register R, we handled the partial clobber > > separately from -fipa-ra, and so treated the upper bits of R as > > clobbered even if we knew that the target function doesn't touch R. > > > > "Fixing" this caused problems for the vzeroupper handling on x86. > > The pass that inserts the vzerouppers assumes that no 256-bit or 512-bit > > values are live across a call unless the call takes a 256-bit or 512-bit > > argument: > > > > /* Needed mode is set to AVX_U128_CLEAN if there are > > no 256bit or 512bit modes used in function arguments. */ > > > > This implicitly relies on: > > > > /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED. The only ABI that > > saves SSE registers across calls is Win64 (thus no need to check the > > current ABI here), and with AVX enabled Win64 only guarantees that > > the low 16 bytes are saved. */ > > > > static bool > > ix86_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED, > > unsigned int regno, machine_mode mode) > > { > > return SSE_REGNO_P (regno) && GET_MODE_SIZE (mode) > 16; > > } > > > > The comment suggests that this code is only needed for Win64 and that > > not testing for Win64 is just a simplification. But in practice it was > > needed for correctness on GNU/Linux and other targets too, since without > > it the RA would be able to keep 256-bit and 512-bit values in SSE > > registers across calls that are known not to clobber them. > > > > This patch conservatively treats calls as AVX_U128_ANY if the RA can see > > that some SSE registers are not touched by a call. There are then no > > regressions if the ix86_hard_regno_call_part_clobbered check is disabled > > for GNU/Linux (not something we should do, was just for testing). > > > > If in fact we want -fipa-ra to pretend that all functions clobber > > SSE registers above 128 bits, it'd certainly be possible to arrange > > that. But IMO that would be an optimisation decision, whereas what > > the patch is fixing is a correctness decision. So I think we should > > have this check even so. > > 2019-09-25 Richard Sandiford <richard.sandif...@arm.com> > > gcc/ > * config/i386/i386.c: Include function-abi.h. > (ix86_avx_u128_mode_needed): Treat function calls as AVX_U128_ANY > if they preserve some 256-bit or 512-bit SSE registers.
OK. Thanks, Uros. > > Index: gcc/config/i386/i386.c > =================================================================== > --- gcc/config/i386/i386.c 2019-09-25 16:47:48.000000000 +0100 > +++ gcc/config/i386/i386.c 2019-09-25 16:47:49.089962608 +0100 > @@ -95,6 +95,7 @@ #define IN_TARGET_CODE 1 > #include "i386-builtins.h" > #include "i386-expand.h" > #include "i386-features.h" > +#include "function-abi.h" > > /* This file should be included last. */ > #include "target-def.h" > @@ -13511,6 +13512,15 @@ ix86_avx_u128_mode_needed (rtx_insn *ins > } > } > > + /* If the function is known to preserve some SSE registers, > + RA and previous passes can legitimately rely on that for > + modes wider than 256 bits. It's only safe to issue a > + vzeroupper if all SSE registers are clobbered. */ > + const function_abi &abi = insn_callee_abi (insn); > + if (!hard_reg_set_subset_p (reg_class_contents[ALL_SSE_REGS], > + abi.mode_clobbers (V4DImode))) > + return AVX_U128_ANY; > + > return AVX_U128_CLEAN; > } >