Re: [PATCH, V2] Eliminate power8 fusion options, use power8 tuning, PR target/102059

will schmidt via Gcc-patches Thu, 10 Mar 2022 08:45:13 -0800

On Wed, 2022-03-09 at 22:49 -0500, Michael Meissner wrote:
> Eliminate power8 fusion options, use power8 tuning, PR target/102059


Hi,

> 
> The power8 fusion support used to be set automatically when -mcpu=power8 or
> -mtune=power8 was used, and it was cleared for other cpu's.  However, if you
> used the target attribute or target #pragma to change the default cpu type or
> tuning, you would get an error that a target specifiction option mismatch
> occurred.
> 
specification. 
(ok :-)


> This occurred because the rs6000_can_inline_p function just compares the ISA
> bits between the called inline function and the caller.  If the ISA flags of
> the called function is not a subset of the ISA flags of the caller, we won't 
> do
> the inlinging.  When a power9 or power10 function inlines a function that is
> explicitly compiled for power8, the power8 function has the power8 fusion bits
> set and the power9 or power10 functions do not have the fusion bits set.
> 

inlining.


> This code removes the -mpower8-fusion and -mpower8-fusion-sign options, and
> only enables power8 fusion if we are tuning for a power8.  Power8 sign fusion
> is only enabled if we are tuning for a power8 and we have -O3 optimization or
> higher.
> 
> I left the options -mno-power8-fusion and -mno-power8-fusion-sign in 
> rs6000.opt
> and they don't issue a warning.  If the user explicitly used -mpower8-fusion 
> or
> -mpower8-fusion-sign, then they will get a warning that the swtich has been
> removed.
> 

switch


> Similarly, I left in the pragma target and attribute target support for the
> fusion options, but they don't do anything now.  This is because I believe the
> customer who encountered this problem now is explicitly setting the
> no-power8-fusion option in the pragma or attribute to avoid the warning.
> 
> I have tested this on the following systems, and they all bootstraps fine and
> there were no regressions in the test suite:
> 
>     big endian power8 (both 64-bit and 32-bit)
>     little endian power9
>     little endian power10
> 

ok.

> Can I check this patch into the current master branch for GCC and after a
> cooling period check in the patch to the GCC 11 and GCC 10 branches.  The
> customer is currently using GCC 10.
> 
> 2022-03-09   Michael Meissner  <meiss...@linux.ibm.com>
> 
> gcc/
>       PR target/102059
>       * config/rs6000/rs6000-cpus.def (OTHER_FUSION_MASKS): Delete.
>       (ISA_3_0_MASKS_SERVER): Don't clear the fusion masks.
>       (POWERPC_MASKS): Remove OPTION_MASK_P8_FUSION.

ok

>       * config/rs6000/rs6000.cc (rs6000_option_override_internal):
>       Delete code that set the power8 fusion options automatically.
>       (rs6000_opt_masks): Allow #pragma target and attribute target to set
>       power8-fusion and power8-fusion-sign, but these no longer represent
>       options that the user can set.
>       (rs6000_print_options_internal): Skip printing nop options.

ok


>       * config/rs6000/rs6000.h (TARGET_P8_FUSION): New macro.
>       (TARGET_P8_FUSION_SIGN): Likewise.
>       (MASK_P8_FUSION): Delete.

ok


>       * config/rs6000/rs6000.opt (-mpower8-fusion): Recognize the option but
>       ignore the no form and warn that the option was removed for the regular
>       form.
>       (-mpower8-fusion-sign): Likewise.

ok

>       * doc/invoke.texi (RS/6000 and PowerPC Options): Delete -mpower8-fusion
>       and -mpower8-fusion-sign.

This change removes the -mpower8-fusion and -mno-power8-fusion options,
There is not a direct reference to -mpower8-fusion-sign in the change
here.  It may be an implied removal, but not immediately obvious to me.


> 
> gcc/testsuite/
>       PR target/102059
>       * gcc.dg/lto/pr102059-1_0.c: Remove -mno-power8-fusion.
>       * gcc.dg/lto/pr102059-2_0.c: Likewise.
>       * gcc.target/powerpc/pr102059-3.c: Likewise.
>       * gcc.target/powerpc/pr102059-4.c: New test.

ok

> ---
>  gcc/config/rs6000/rs6000-cpus.def             | 22 +++------
>  gcc/config/rs6000/rs6000.cc                   | 49 +++++--------------
>  gcc/config/rs6000/rs6000.h                    | 14 +++++-
>  gcc/config/rs6000/rs6000.opt                  | 19 +++++--
>  gcc/doc/invoke.texi                           | 13 +----
>  gcc/testsuite/gcc.dg/lto/pr102059-1_0.c       |  2 +-
>  gcc/testsuite/gcc.dg/lto/pr102059-2_0.c       |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr102059-3.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr102059-4.c | 23 +++++++++
>  9 files changed, 75 insertions(+), 71 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-4.c
> 
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index 963947f6939..a05b2d8c41a 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -43,9 +43,7 @@
>                                | OPTION_MASK_ALTIVEC                  \
>                                | OPTION_MASK_VSX)
> 
> -/* For now, don't provide an embedded version of ISA 2.07.  Do not set power8
> -   fusion here, instead set it in rs6000.cc if we are tuning for a power8
> -   system.  */
> +/* For now, don't provide an embedded version of ISA 2.07.  */

ok.  (as far as removing the comment, I'm not clear what the remaining
comment is telling me, but thats outside of the scope of this patch).


>  #define ISA_2_7_MASKS_SERVER (ISA_2_6_MASKS_SERVER                   \
>                                | OPTION_MASK_P8_VECTOR                \
>                                | OPTION_MASK_CRYPTO                   \
> @@ -54,19 +52,14 @@
>                                | OPTION_MASK_QUAD_MEMORY              \
>                                | OPTION_MASK_QUAD_MEMORY_ATOMIC)
> 
> -/* ISA masks setting fusion options.  */
> -#define OTHER_FUSION_MASKS   (OPTION_MASK_P8_FUSION                  \
> -                              | OPTION_MASK_P8_FUSION_SIGN)
> -
ok
>  /* Add ISEL back into ISA 3.0, since it is supposed to be a win.  Do not add
>     FLOAT128_HW here until we are ready to make -mfloat128 on by default.  */
> -#define ISA_3_0_MASKS_SERVER ((ISA_2_7_MASKS_SERVER                  \
> -                               | OPTION_MASK_ISEL                    \
> -                               | OPTION_MASK_MODULO                  \
> -                               | OPTION_MASK_P9_MINMAX               \
> -                               | OPTION_MASK_P9_MISC                 \
> -                               | OPTION_MASK_P9_VECTOR)              \
> -                              & ~OTHER_FUSION_MASKS)
> +#define ISA_3_0_MASKS_SERVER (ISA_2_7_MASKS_SERVER                   \
> +                              | OPTION_MASK_ISEL                     \
> +                              | OPTION_MASK_MODULO                   \
> +                              | OPTION_MASK_P9_MINMAX                \
> +                              | OPTION_MASK_P9_MISC                  \
> +                              | OPTION_MASK_P9_VECTOR)
> 

Whitespace change.  I assume this fixes an indentation glitch. 


>  /* Support for the IEEE 128-bit floating point hardware requires a lot of the
>     VSX instructions that are part of ISA 3.0.  */
> @@ -140,7 +133,6 @@
>                                | OPTION_MASK_MODULO                   \
>                                | OPTION_MASK_MULHW                    \
>                                | OPTION_MASK_NO_UPDATE                \
> -                              | OPTION_MASK_P8_FUSION                \
>                                | OPTION_MASK_P8_VECTOR                \
>                                | OPTION_MASK_P9_MINMAX                \
>                                | OPTION_MASK_P9_MISC                  \

ok.

> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 3afe78f5d04..46dad54a066 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -4040,41 +4040,6 @@ rs6000_option_override_internal (bool global_init_p)
>        && optimize_function_for_speed_p (cfun))
>      rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
> 
> -  /* Enable power8 fusion if we are tuning for power8, even if we aren't
> -     generating power8 instructions.  Power9 does not optimize power8 fusion
> -     cases.  */
> -  if (!(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION))
> -    {
> -      if (processor_target_table[tune_index].processor == PROCESSOR_POWER8)
> -     rs6000_isa_flags |= OPTION_MASK_P8_FUSION;
> -      else
> -     rs6000_isa_flags &= ~OPTION_MASK_P8_FUSION;
> -    }
> -
> -  /* Setting additional fusion flags turns on base fusion.  */
> -  if (!TARGET_P8_FUSION && TARGET_P8_FUSION_SIGN)
> -    {
> -      if (rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION)
> -     {
> -       if (TARGET_P8_FUSION_SIGN)
> -         error ("%qs requires %qs", "-mpower8-fusion-sign",
> -                "-mpower8-fusion");
> -
> -       rs6000_isa_flags &= ~OPTION_MASK_P8_FUSION;
> -     }
> -      else
> -     rs6000_isa_flags |= OPTION_MASK_P8_FUSION;
> -    }
> -
> -  /* Power8 does not fuse sign extended loads with the addis.  If we are
> -     optimizing at high levels for speed, convert a sign extended load into a
> -     zero extending load, and an explicit sign extension.  */
> -  if (TARGET_P8_FUSION
> -      && !(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION_SIGN)
> -      && optimize_function_for_speed_p (cfun)
> -      && optimize >= 3)
> -    rs6000_isa_flags |= OPTION_MASK_P8_FUSION_SIGN;
> -


ok

>    /* ISA 3.0 vector instructions include ISA 2.07.  */
>    if (TARGET_P9_VECTOR && !TARGET_P8_VECTOR)
>      {
> @@ -23983,8 +23948,6 @@ static struct rs6000_opt_mask const 
> rs6000_opt_masks[] =
>    { "pcrel-opt",             OPTION_MASK_PCREL_OPT,          false, true  },
>    { "popcntb",                       OPTION_MASK_POPCNTB,            false, 
> true  },
>    { "popcntd",                       OPTION_MASK_POPCNTD,            false, 
> true  },
> -  { "power8-fusion",         OPTION_MASK_P8_FUSION,          false, true  },
> -  { "power8-fusion-sign",    OPTION_MASK_P8_FUSION_SIGN,     false, true  },
>    { "power8-vector",         OPTION_MASK_P8_VECTOR,          false, true  },
>    { "power9-minmax",         OPTION_MASK_P9_MINMAX,          false, true  },
>    { "power9-misc",           OPTION_MASK_P9_MISC,            false, true  },
> @@ -24024,6 +23987,14 @@ static struct rs6000_opt_mask const 
> rs6000_opt_masks[] =
>  #endif
>    { "soft-float",            OPTION_MASK_SOFT_FLOAT,         false, false },
>    { "string",                        0,                              false, 
> false },
> +
> +  /* Power8 fusion options were removed, but ignore using them in #pragma and
> +     attribute target.  Users may have used these options to suppress errors 
> if
> +     they declare an inline function to be specifically power8 and the 
> function
> +     was included by power9 or power10 which turned off the power8 fusion
> +     support.  */

I'd be tempted to clarify 'ignore' with some form of 'continue to
allow', but i'm not certain if that would be helpful. 
After reading
further, ..  The comment in the rs6000.opt change below flows better to
me.

> +  { "power8-fusion",         0,                              false, true  },
> +  { "power8-fusion-sign",    0,                              false, true  },
>  };


ok



> 
>  /* Builtin mask mapping for printing the flags.  */
> @@ -24660,6 +24631,10 @@ rs6000_print_options_internal (FILE *file,
>        HOST_WIDE_INT mask = opts[i].mask;
>        size_t len = comma_len + prefix_len + strlen (name);
> 
> +      /* Don't print NOP options.  */
> +      if (!mask)
> +     continue;
> +

ok

>        if (!invert)
>       {
>         if ((flags & mask) == 0)
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 17af314416c..4ae45ff822f 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -504,6 +504,19 @@ extern int rs6000_vector_align[];
>  #define TARGET_MINMAX        (TARGET_HARD_FLOAT && TARGET_PPC_GFXOPT         
> \
>                        && (TARGET_P9_MINMAX || !flag_trapping_math))
> 
> +/* Power8 has special fusion operations that are enabled if we are tuning for
> +   power8.  This used to be settable with an option (-mpower8-fusion), but 
> that
> +   option has been removed.  */
> +#define TARGET_P8_FUSION     (rs6000_tune == PROCESSOR_POWER8)
> +
> +/* Power8 fusion does not fuse loads with sign extends.  If we are doing 
> higher
> +   optimization levels, split loads with sign extension to loads with zero
> +   extension and an explicit sign extend operation, so that the zero 
> extending
> +   load can be fused.  */
> +#define TARGET_P8_FUSION_SIGN        (TARGET_P8_FUSION                       
> \
> +                              && optimize_function_for_speed_p (cfun) \
> +                              && optimize >= 3)
> +

ok


>  /* In switching from using target_flags to using rs6000_isa_flags, the 
> options
>     machinery creates OPTION_MASK_<xxx> instead of MASK_<xxx>.  For now map
>     OPTION_MASK_<xxx> back into MASK_<xxx>.  */
> @@ -517,7 +530,6 @@ extern int rs6000_vector_align[];
>  #define MASK_FLOAT128_KEYWORD                OPTION_MASK_FLOAT128_KEYWORD
>  #define MASK_FLOAT128_HW             OPTION_MASK_FLOAT128_HW
>  #define MASK_FPRND                   OPTION_MASK_FPRND
> -#define MASK_P8_FUSION                       OPTION_MASK_P8_FUSION
>  #define MASK_HARD_FLOAT                      OPTION_MASK_HARD_FLOAT
>  #define MASK_HTM                     OPTION_MASK_HTM
>  #define MASK_ISEL                    OPTION_MASK_ISEL

ok

> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index 4931d781c4e..a817df1b448 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -474,13 +474,24 @@ Save the TOC in the prologue for indirect calls rather 
> than inline.
>  mvsx-timode
>  Target RejectNegative Undocumented Ignore
> 
> +; The -mpower8-fusion and -mpower8-fusion-sign options existed in the past, 
> but
> +; they have been removed.  Some users in the past needed to use the
> +; -mno-power8-fusion option when they had inline functions that were 
> specified
> +; as generating power8 code and the functions were included by power9 or
> +; power10 function.  Using -mno-power8-fusion prevented an error in this 
> case.
> +; We allow the -mno-power8-fusion and -mno-power8-fusion-sign options 
> without a
> +; warning.


Ok. 


> +mno-power8-fusion
> +Target RejectNegative Undocumented
> +
>  mpower8-fusion
> -Target Mask(P8_FUSION) Var(rs6000_isa_flags)
> -Fuse certain integer operations together for better performance on power8.
> +Target RejectNegative Undocumented WarnRemoved
> +
> +mno-power8-fusion-sign
> +Target RejectNegative Undocumented
> 
>  mpower8-fusion-sign
> -Target Undocumented Mask(P8_FUSION_SIGN) Var(rs6000_isa_flags)
> -Allow sign extension in fusion operations.
> +Target RejectNegative Undocumented WarnRemoved
> 

ok


>  mpower8-vector
>  Target Mask(P8_VECTOR) Var(rs6000_isa_flags)
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index a0fa5e1cf43..cf7e4e22de6 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1255,8 +1255,7 @@ See RS/6000 and PowerPC Options.
>  -mveclibabi=@var{type}  -mfriz  -mno-friz @gol
>  -mpointers-to-nested-functions  -mno-pointers-to-nested-functions @gol
>  -msave-toc-indirect  -mno-save-toc-indirect @gol
> --mpower8-fusion  -mno-mpower8-fusion  -mpower8-vector  -mno-power8-vector 
> @gol
> --mcrypto  -mno-crypto  -mhtm  -mno-htm @gol
> +-mpower8-vector  -mno-power8-vector -mcrypto  -mno-crypto  -mhtm  -mno-htm 
> @gol
>  -mquad-memory  -mno-quad-memory @gol
>  -mquad-memory-atomic  -mno-quad-memory-atomic @gol
>  -mcompat-align-parm  -mno-compat-align-parm @gol

ok.   (Assuming the combining of the two lines onto one is kosher. )


> @@ -28080,7 +28079,7 @@ following options:
>  -mpopcntb  -mpopcntd  -mpowerpc64 @gol
>  -mpowerpc-gpopt  -mpowerpc-gfxopt @gol
>  -mmulhw  -mdlmzb  -mmfpgpr  -mvsx @gol
> --mcrypto  -mhtm  -mpower8-fusion  -mpower8-vector @gol
> +-mcrypto  -mhtm  -mpower8-vector @gol
>  -mquad-memory  -mquad-memory-atomic  -mfloat128 @gol
>  -mfloat128-hardware -mprefixed -mpcrel -mmma @gol
>  -mrop-protect}
> @@ -28195,14 +28194,6 @@ Enable (disable) the use of the built-in functions 
> that allow direct
>  access to the Hardware Transactional Memory (HTM) instructions that
>  were added in version 2.07 of the PowerPC ISA.
> 
> -@item -mpower8-fusion
> -@itemx -mno-power8-fusion
> -@opindex mpower8-fusion
> -@opindex mno-power8-fusion
> -Generate code that keeps (does not keeps) some integer operations
> -adjacent so that the instructions can be fused together on power8 and
> -later processors.
> -


ok

>  @item -mpower8-vector
>  @itemx -mno-power8-vector
>  @opindex mpower8-vector
> diff --git a/gcc/testsuite/gcc.dg/lto/pr102059-1_0.c 
> b/gcc/testsuite/gcc.dg/lto/pr102059-1_0.c
> index e1004f5cfbf..b215b701097 100644
> --- a/gcc/testsuite/gcc.dg/lto/pr102059-1_0.c
> +++ b/gcc/testsuite/gcc.dg/lto/pr102059-1_0.c
> @@ -1,7 +1,7 @@
>  /* { dg-lto-do link } */
>  /* { dg-skip-if "power10 and above only" { ! { power10_ok } } } */
>  /* -Wno-attributes suppresses always_inline warnings.  */
> -/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -flto -Wno-attributes 
> -mno-power8-fusion" } } */
> +/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -flto -Wno-attributes" } } */
> 
>  int __attribute__ ((always_inline))
>  foo1 (int *b)
> diff --git a/gcc/testsuite/gcc.dg/lto/pr102059-2_0.c 
> b/gcc/testsuite/gcc.dg/lto/pr102059-2_0.c
> index ebb0a7b8fa1..ffab23ab7e1 100644
> --- a/gcc/testsuite/gcc.dg/lto/pr102059-2_0.c
> +++ b/gcc/testsuite/gcc.dg/lto/pr102059-2_0.c
> @@ -1,6 +1,6 @@
>  /* { dg-lto-do link } */
>  /* { dg-skip-if "power10 and above only" { ! { power10_ok } } } */
> -/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -mno-power8-fusion -flto 
> -fdump-ipa-inline" } } */
> +/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -flto -fdump-ipa-inline" } } 
> */
> 
>  int
>  foo1 (int *b)
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-3.c 
> b/gcc/testsuite/gcc.target/powerpc/pr102059-3.c
> index 21c982d93f0..0cb3a4cb9f9 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr102059-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr102059-3.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -mdejagnu-cpu=power8 -mno-power8-fusion 
> -fdump-tree-einline-optimized" } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8 -fdump-tree-einline-optimized" } */
> 
>  /* Like pr102059-1.c, to verify the inlining still happens
>     even without always_inline attribute.  */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-4.c 
> b/gcc/testsuite/gcc.target/powerpc/pr102059-4.c
> new file mode 100644
> index 00000000000..5fe66f8af4b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr102059-4.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
> +/* { dg-require-effective-target power10_ok } */
> +
> +/* Verify that power10 can explicity include functions compiled for power8.
> +   The issue was -mcpu=power8 enables -mpower8-fusion, but -mcpu=power9 or
> +   -mcpu=power10 do not set power8-fusion by default.  Thus when doing this
> +   compilation, they would get an error that the inline function failed in 
> its
> +   inlining due to having incompatible options.  */
> +
> +static inline int __attribute__ ((always_inline,target("cpu=power8")))
> +foo (int *b)
> +{
> +  *b += 10;
> +  return *b;
> +}
> +
> +int
> +bar (int *a)
> +{
> +  *a = foo (a);
> +  return 0;
> +}
> -- 
> 2.35.1
> 


lgtm, 
thanks,
-Will

Re: [PATCH, V2] Eliminate power8 fusion options, use power8 tuning, PR target/102059

Reply via email to