On Wed, Sep 25, 2019 at 5:48 PM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Ping
>
> Richard Sandiford <richard.sandif...@arm.com> writes:
> > One of the effects of the function_abi series is to make -fipa-ra
> > work for partially call-clobbered registers.  E.g. if a call preserves
> > only the low 32 bits of a register R, we handled the partial clobber
> > separately from -fipa-ra, and so treated the upper bits of R as
> > clobbered even if we knew that the target function doesn't touch R.
> >
> > "Fixing" this caused problems for the vzeroupper handling on x86.
> > The pass that inserts the vzerouppers assumes that no 256-bit or 512-bit
> > values are live across a call unless the call takes a 256-bit or 512-bit
> > argument:
> >
> >       /* Needed mode is set to AVX_U128_CLEAN if there are
> >        no 256bit or 512bit modes used in function arguments. */
> >
> > This implicitly relies on:
> >
> > /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The only ABI that
> >    saves SSE registers across calls is Win64 (thus no need to check the
> >    current ABI here), and with AVX enabled Win64 only guarantees that
> >    the low 16 bytes are saved.  */
> >
> > static bool
> > ix86_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
> >                                    unsigned int regno, machine_mode mode)
> > {
> >   return SSE_REGNO_P (regno) && GET_MODE_SIZE (mode) > 16;
> > }
> >
> > The comment suggests that this code is only needed for Win64 and that
> > not testing for Win64 is just a simplification.  But in practice it was
> > needed for correctness on GNU/Linux and other targets too, since without
> > it the RA would be able to keep 256-bit and 512-bit values in SSE
> > registers across calls that are known not to clobber them.
> >
> > This patch conservatively treats calls as AVX_U128_ANY if the RA can see
> > that some SSE registers are not touched by a call.  There are then no
> > regressions if the ix86_hard_regno_call_part_clobbered check is disabled
> > for GNU/Linux (not something we should do, was just for testing).
> >
> > If in fact we want -fipa-ra to pretend that all functions clobber
> > SSE registers above 128 bits, it'd certainly be possible to arrange
> > that.  But IMO that would be an optimisation decision, whereas what
> > the patch is fixing is a correctness decision.  So I think we should
> > have this check even so.
>
> 2019-09-25  Richard Sandiford  <richard.sandif...@arm.com>
>
> gcc/
>         * config/i386/i386.c: Include function-abi.h.
>         (ix86_avx_u128_mode_needed): Treat function calls as AVX_U128_ANY
>         if they preserve some 256-bit or 512-bit SSE registers.

OK.

Thanks,
Uros.

>
> Index: gcc/config/i386/i386.c
> ===================================================================
> --- gcc/config/i386/i386.c      2019-09-25 16:47:48.000000000 +0100
> +++ gcc/config/i386/i386.c      2019-09-25 16:47:49.089962608 +0100
> @@ -95,6 +95,7 @@ #define IN_TARGET_CODE 1
>  #include "i386-builtins.h"
>  #include "i386-expand.h"
>  #include "i386-features.h"
> +#include "function-abi.h"
>
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -13511,6 +13512,15 @@ ix86_avx_u128_mode_needed (rtx_insn *ins
>             }
>         }
>
> +      /* If the function is known to preserve some SSE registers,
> +        RA and previous passes can legitimately rely on that for
> +        modes wider than 256 bits.  It's only safe to issue a
> +        vzeroupper if all SSE registers are clobbered.  */
> +      const function_abi &abi = insn_callee_abi (insn);
> +      if (!hard_reg_set_subset_p (reg_class_contents[ALL_SSE_REGS],
> +                                 abi.mode_clobbers (V4DImode)))
> +       return AVX_U128_ANY;
> +
>        return AVX_U128_CLEAN;
>      }
>

Reply via email to