Re: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059

will schmidt via Gcc-patches Wed, 20 Apr 2022 14:20:41 -0700

On Tue, 2022-04-12 at 21:14 -0400, Michael Meissner wrote:
> Eliminate power8 fusion options, use power8 tuning, PR target/102059
> 
> This is V4 of the patch.  Compared to V3 of the patch, GCC will just
> ignore -m{,no-}power8-fusion and -m{,no-}power8-fusion-sign.
>



Hi, 
No comments on code, a few comments about the comments below.



> The splitting of signed halfword and word loads into unsigned load and
> sign extension is now suppressed with -Os, but it is done normally if we
> are not optimizing for space.

I see references to TARGET_P8_FUSION_SIGN in the patch below, and some
removal of old code.  I assume this describes the implementation that
remains.  

> 
> The power8 fusion support used to be set automatically when -mcpu=power8 or
> -mtune=power8 was used, and it was cleared for other cpu's.  However, if you
> used the target attribute or target #pragma to change the default cpu type or
> tuning, you would get an error that a target specifiction option mismatch
> occurred.

specification.  :-)

> 
> This occurred because the rs6000_can_inline_p function just compares the ISA
> bits between the called inline function and the caller.  If the ISA flags of
> the called function is not a subset of the ISA flags of the caller, we won't 
> do
> the inlinging.  When a power9 or power10 function inlines a function that is
> explicitly compiled for power8, the power8 function has the power8 fusion bits
> set and the power9 or power10 functions do not have the fusion bits set.

inlining. 


> 
> This code makes the -mpower8-fusion option a nop.  It is accepted without
> warning, but it does nothing.  Power8 fusion is only enabled if we are tuning
> for a power8.
> 
> The undocumented -mpower8-fusion-sign option is also made into a nop.
> 
> I left in the pragma target and attribute target support for power8-fusion, 
> but
> using it doesn't do anything now.  This is because I told the customer who
> encountered this problem that one solution was to add an explicit
> no-power8-fusion option in their target pragma or attribute to work around the
> problem.
> 
> I have tested this patch on a little endian power10 system.  I have tested
> previous versions on little endian power9 and big endian power8 systems.
> Can I apply this patch to the master branch?
> 
> If it is accepted, I will produce a similar patch for back porting to GCC 11
> and GCC 10.
> 
> 2022-04-12   Michael Meissner  <meiss...@linux.ibm.com>
> 
> gcc/
>       PR target/102059
>       * config/rs6000/rs6000-cpus.def (OTHER_FUSION_MASKS): Delete.
>       (ISA_3_0_MASKS_SERVER): Don't clear the fusion masks.
>       (POWERPC_MASKS): Remove OPTION_MASK_P8_FUSION.
>       * config/rs6000/rs6000.cc (rs6000_option_override_internal):
>       Delete code that set the power8 fusion options automatically.
>       (rs6000_opt_masks): Allow #pragma target and attribute target
>       power8-fusion option for backwards compatibility.
>       (rs6000_print_options_internal): Skip printing backward
>       compatibility options that are just ignored.
>       * config/rs6000/rs6000.h (TARGET_P8_FUSION): New macro.
>       (TARGET_P8_FUSION_SIGN): Likewise.
>       (MASK_P8_FUSION): Delete.
>       * config/rs6000/rs6000.opt (-mpower8-fusion): Recognize the option but
>       ignore it completely.
>       (-mpower8-fusion-sign): Likewise.
>       * doc/invoke.texi (RS/6000 and PowerPC Options): Delete
>       -mpower8-fusion.
> 
> gcc/testsuite/
>       PR target/102059
>       * gcc.dg/lto/pr102059-1_0.c: Remove -mno-power8-fusion.
>       * gcc.dg/lto/pr102059-2_0.c: Likewise.
>       * gcc.target/powerpc/pr102059-3.c: Likewise.
>       * gcc.target/powerpc/pr102059-4.c: New test.
> ---
>  gcc/config/rs6000/rs6000-cpus.def             | 18 +++----
>  gcc/config/rs6000/rs6000.cc                   | 49 +++++--------------
>  gcc/config/rs6000/rs6000.h                    | 13 ++++-
>  gcc/config/rs6000/rs6000.opt                  |  8 +--
>  gcc/doc/invoke.texi                           | 13 +----
>  gcc/testsuite/gcc.dg/lto/pr102059-1_0.c       |  2 +-
>  gcc/testsuite/gcc.dg/lto/pr102059-2_0.c       |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr102059-3.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr102059-4.c | 23 +++++++++
>  9 files changed, 62 insertions(+), 68 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-4.c
> 
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index 963947f6939..d913a3d6b73 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -54,19 +54,14 @@
>                                | OPTION_MASK_QUAD_MEMORY              \
>                                | OPTION_MASK_QUAD_MEMORY_ATOMIC)
> 
> -/* ISA masks setting fusion options.  */
> -#define OTHER_FUSION_MASKS   (OPTION_MASK_P8_FUSION                  \
> -                              | OPTION_MASK_P8_FUSION_SIGN)
> -
>  /* Add ISEL back into ISA 3.0, since it is supposed to be a win.  Do not add
>     FLOAT128_HW here until we are ready to make -mfloat128 on by default.  */
> -#define ISA_3_0_MASKS_SERVER ((ISA_2_7_MASKS_SERVER                  \
> -                               | OPTION_MASK_ISEL                    \
> -                               | OPTION_MASK_MODULO                  \
> -                               | OPTION_MASK_P9_MINMAX               \
> -                               | OPTION_MASK_P9_MISC                 \
> -                               | OPTION_MASK_P9_VECTOR)              \
> -                              & ~OTHER_FUSION_MASKS)
> +#define ISA_3_0_MASKS_SERVER (ISA_2_7_MASKS_SERVER                   \
> +                              | OPTION_MASK_ISEL                     \
> +                              | OPTION_MASK_MODULO                   \
> +                              | OPTION_MASK_P9_MINMAX                \
> +                              | OPTION_MASK_P9_MISC                  \
> +                              | OPTION_MASK_P9_VECTOR)
> 
>  /* Support for the IEEE 128-bit floating point hardware requires a lot of the
>     VSX instructions that are part of ISA 3.0.  */
> @@ -140,7 +135,6 @@
>                                | OPTION_MASK_MODULO                   \
>                                | OPTION_MASK_MULHW                    \
>                                | OPTION_MASK_NO_UPDATE                \
> -                              | OPTION_MASK_P8_FUSION                \
>                                | OPTION_MASK_P8_VECTOR                \
>                                | OPTION_MASK_P9_MINMAX                \
>                                | OPTION_MASK_P9_MISC                  \
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index ceaddafd33b..269c7510e3e 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -4040,41 +4040,6 @@ rs6000_option_override_internal (bool global_init_p)
>        && optimize_function_for_speed_p (cfun))
>      rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
> 
> -  /* Enable power8 fusion if we are tuning for power8, even if we aren't
> -     generating power8 instructions.  Power9 does not optimize power8 fusion
> -     cases.  */
> -  if (!(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION))
> -    {
> -      if (processor_target_table[tune_index].processor == PROCESSOR_POWER8)
> -     rs6000_isa_flags |= OPTION_MASK_P8_FUSION;
> -      else
> -     rs6000_isa_flags &= ~OPTION_MASK_P8_FUSION;
> -    }
> -
> -  /* Setting additional fusion flags turns on base fusion.  */
> -  if (!TARGET_P8_FUSION && TARGET_P8_FUSION_SIGN)
> -    {
> -      if (rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION)
> -     {
> -       if (TARGET_P8_FUSION_SIGN)
> -         error ("%qs requires %qs", "-mpower8-fusion-sign",
> -                "-mpower8-fusion");
> -
> -       rs6000_isa_flags &= ~OPTION_MASK_P8_FUSION;
> -     }
> -      else
> -     rs6000_isa_flags |= OPTION_MASK_P8_FUSION;
> -    }
> -
> -  /* Power8 does not fuse sign extended loads with the addis.  If we are
> -     optimizing at high levels for speed, convert a sign extended load into a
> -     zero extending load, and an explicit sign extension.  */
> -  if (TARGET_P8_FUSION
> -      && !(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION_SIGN)
> -      && optimize_function_for_speed_p (cfun)
> -      && optimize >= 3)
> -    rs6000_isa_flags |= OPTION_MASK_P8_FUSION_SIGN;
> -
>    /* ISA 3.0 vector instructions include ISA 2.07.  */
>    if (TARGET_P9_VECTOR && !TARGET_P8_VECTOR)
>      {
> @@ -24010,8 +23975,6 @@ static struct rs6000_opt_mask const 
> rs6000_opt_masks[] =
>    { "pcrel-opt",             OPTION_MASK_PCREL_OPT,          false, true  },
>    { "popcntb",                       OPTION_MASK_POPCNTB,            false, 
> true  },
>    { "popcntd",                       OPTION_MASK_POPCNTD,            false, 
> true  },
> -  { "power8-fusion",         OPTION_MASK_P8_FUSION,          false, true  },
> -  { "power8-fusion-sign",    OPTION_MASK_P8_FUSION_SIGN,     false, true  },
>    { "power8-vector",         OPTION_MASK_P8_VECTOR,          false, true  },
>    { "power9-minmax",         OPTION_MASK_P9_MINMAX,          false, true  },
>    { "power9-misc",           OPTION_MASK_P9_MISC,            false, true  },
> @@ -24051,6 +24014,13 @@ static struct rs6000_opt_mask const 
> rs6000_opt_masks[] =
>  #endif
>    { "soft-float",            OPTION_MASK_SOFT_FLOAT,         false, false },
>    { "string",                        0,                              false, 
> false },
> +
> +  /* The Power8 fusion option was removed.  We ignore using it in #pragma and
> +     attribute target.  Users may have used the options to suppress errors if
> +     they declare an inline function to be specifically power8 and the 
> function
> +     was included by power9 or power10 which turned off the power8 fusion
> +     support.  */

"was removed" is a good detail for history, but possibly not required
here?    Maybe stl "The Power8 fusion option is a no-op".  ?


> +  { "power8-fusion",         0,                              false, true  },
>  };
> 
>  /* Builtin mask mapping for printing the flags.  */
> @@ -24687,6 +24657,11 @@ rs6000_print_options_internal (FILE *file,
>        HOST_WIDE_INT mask = opts[i].mask;
>        size_t len = comma_len + prefix_len + strlen (name);
> 
> +      /* Don't print options that exist for backwards compatibility, but are
> +      ignored now like -mpower8-fusion.  */
> +      if (!mask)
> +     continue;

I'm not sure calling out the -mpower8-fusion option here is necessary. 
perhaps s/ignored now like -mpower8-fusion/masked out./



> +
>        if (!invert)
>       {
>         if ((flags & mask) == 0)
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 523256a5c9d..52a29e1c702 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -504,6 +504,18 @@ extern int rs6000_vector_align[];
>  #define TARGET_MINMAX        (TARGET_HARD_FLOAT && TARGET_PPC_GFXOPT         
> \
>                        && (TARGET_P9_MINMAX || !flag_trapping_math))
> 
> +/* Power8 has special fusion operations that are enabled if we are tuning for
> +   power8.  This used to be settable with an option (-mpower8-fusion), but 
> that
> +   option has been removed.  */

s/has been removed/is now ignored/  ?


> +#define TARGET_P8_FUSION     (rs6000_tune == PROCESSOR_POWER8)
> +
> +/* Power8 fusion does not fuse loads with sign extends.  If we are not
> +   optimizing for space, split loads with sign extension to loads with zero
> +   extension and an explicit sign extend operation, so that the zero 
> extending
> +   load can be fused.  */
> +#define TARGET_P8_FUSION_SIGN        (TARGET_P8_FUSION                       
> \
> +                              && !optimize_function_for_size_p (cfun))
> +
>  /* In switching from using target_flags to using rs6000_isa_flags, the 
> options
>     machinery creates OPTION_MASK_<xxx> instead of MASK_<xxx>.  For now map
>     OPTION_MASK_<xxx> back into MASK_<xxx>.  */
> @@ -517,7 +529,6 @@ extern int rs6000_vector_align[];
>  #define MASK_FLOAT128_KEYWORD                OPTION_MASK_FLOAT128_KEYWORD
>  #define MASK_FLOAT128_HW             OPTION_MASK_FLOAT128_HW
>  #define MASK_FPRND                   OPTION_MASK_FPRND
> -#define MASK_P8_FUSION                       OPTION_MASK_P8_FUSION
>  #define MASK_HARD_FLOAT                      OPTION_MASK_HARD_FLOAT
>  #define MASK_HTM                     OPTION_MASK_HTM
>  #define MASK_ISEL                    OPTION_MASK_ISEL
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index 4931d781c4e..6c4caf4c9ee 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -474,13 +474,13 @@ Save the TOC in the prologue for indirect calls rather 
> than inline.
>  mvsx-timode
>  Target RejectNegative Undocumented Ignore
> 
> +; The -mpower8-fusion and -mpower8-fusion-sign options existed in the past, 
> but
> +; they are ignored now.

If the comment is necessary (it may be redundant with the Ignore
attribute in place)  I suggest  s/existed in the past, but they//

>  mpower8-fusion
> -Target Mask(P8_FUSION) Var(rs6000_isa_flags)
> -Fuse certain integer operations together for better performance on power8.
> +Target Undocumented Ignore
> 
>  mpower8-fusion-sign
> -Target Undocumented Mask(P8_FUSION_SIGN) Var(rs6000_isa_flags)
> -Allow sign extension in fusion operations.
> +Target Undocumented Ignore
> 
>  mpower8-vector
>  Target Mask(P8_VECTOR) Var(rs6000_isa_flags)
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 1a51759e6e4..7c21779e3b2 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1266,8 +1266,7 @@ See RS/6000 and PowerPC Options.
>  -mveclibabi=@var{type}  -mfriz  -mno-friz @gol
>  -mpointers-to-nested-functions  -mno-pointers-to-nested-functions @gol
>  -msave-toc-indirect  -mno-save-toc-indirect @gol
> --mpower8-fusion  -mno-mpower8-fusion  -mpower8-vector  -mno-power8-vector 
> @gol
> --mcrypto  -mno-crypto  -mhtm  -mno-htm @gol
> +-mpower8-vector  -mno-power8-vector -mcrypto  -mno-crypto  -mhtm  -mno-htm 
> @gol
>  -mquad-memory  -mno-quad-memory @gol
>  -mquad-memory-atomic  -mno-quad-memory-atomic @gol
>  -mcompat-align-parm  -mno-compat-align-parm @gol
> @@ -28355,7 +28354,7 @@ following options:
>  -mpopcntb  -mpopcntd  -mpowerpc64 @gol
>  -mpowerpc-gpopt  -mpowerpc-gfxopt @gol
>  -mmulhw  -mdlmzb  -mmfpgpr  -mvsx @gol
> --mcrypto  -mhtm  -mpower8-fusion  -mpower8-vector @gol
> +-mcrypto  -mhtm  -mpower8-vector @gol
>  -mquad-memory  -mquad-memory-atomic  -mfloat128 @gol
>  -mfloat128-hardware -mprefixed -mpcrel -mmma @gol
>  -mrop-protect}
> @@ -28470,14 +28469,6 @@ Enable (disable) the use of the built-in functions 
> that allow direct
>  access to the Hardware Transactional Memory (HTM) instructions that
>  were added in version 2.07 of the PowerPC ISA.
> 
> -@item -mpower8-fusion
> -@itemx -mno-power8-fusion
> -@opindex mpower8-fusion
> -@opindex mno-power8-fusion
> -Generate code that keeps (does not keeps) some integer operations
> -adjacent so that the instructions can be fused together on power8 and
> -later processors.
> -
>  @item -mpower8-vector
>  @itemx -mno-power8-vector
>  @opindex mpower8-vector
> diff --git a/gcc/testsuite/gcc.dg/lto/pr102059-1_0.c 
> b/gcc/testsuite/gcc.dg/lto/pr102059-1_0.c
> index e1004f5cfbf..b215b701097 100644
> --- a/gcc/testsuite/gcc.dg/lto/pr102059-1_0.c
> +++ b/gcc/testsuite/gcc.dg/lto/pr102059-1_0.c
> @@ -1,7 +1,7 @@
>  /* { dg-lto-do link } */
>  /* { dg-skip-if "power10 and above only" { ! { power10_ok } } } */
>  /* -Wno-attributes suppresses always_inline warnings.  */
> -/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -flto -Wno-attributes 
> -mno-power8-fusion" } } */
> +/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -flto -Wno-attributes" } } */
> 
>  int __attribute__ ((always_inline))
>  foo1 (int *b)
> diff --git a/gcc/testsuite/gcc.dg/lto/pr102059-2_0.c 
> b/gcc/testsuite/gcc.dg/lto/pr102059-2_0.c
> index ebb0a7b8fa1..ffab23ab7e1 100644
> --- a/gcc/testsuite/gcc.dg/lto/pr102059-2_0.c
> +++ b/gcc/testsuite/gcc.dg/lto/pr102059-2_0.c
> @@ -1,6 +1,6 @@
>  /* { dg-lto-do link } */
>  /* { dg-skip-if "power10 and above only" { ! { power10_ok } } } */
> -/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -mno-power8-fusion -flto 
> -fdump-ipa-inline" } } */
> +/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -flto -fdump-ipa-inline" } } 
> */
> 
>  int
>  foo1 (int *b)
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-3.c 
> b/gcc/testsuite/gcc.target/powerpc/pr102059-3.c
> index 21c982d93f0..0cb3a4cb9f9 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr102059-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr102059-3.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -mdejagnu-cpu=power8 -mno-power8-fusion 
> -fdump-tree-einline-optimized" } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8 -fdump-tree-einline-optimized" } */
> 
>  /* Like pr102059-1.c, to verify the inlining still happens
>     even without always_inline attribute.  */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-4.c 
> b/gcc/testsuite/gcc.target/powerpc/pr102059-4.c
> new file mode 100644
> index 00000000000..5fe66f8af4b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr102059-4.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
> +/* { dg-require-effective-target power10_ok } */

I think for coverage, this test could actually target power9_ok, but
sufficient as is. 


lgtm,
thanks
-Will


> +
> +/* Verify that power10 can explicity include functions compiled for power8.
> +   The issue was -mcpu=power8 enables -mpower8-fusion, but -mcpu=power9 or
> +   -mcpu=power10 do not set power8-fusion by default.  Thus when doing this
> +   compilation, they would get an error that the inline function failed in 
> its
> +   inlining due to having incompatible options.  */
> +
> +static inline int __attribute__ ((always_inline,target("cpu=power8")))
> +foo (int *b)
> +{
> +  *b += 10;
> +  return *b;
> +}
> +
> +int
> +bar (int *a)
> +{
> +  *a = foo (a);
> +  return 0;
> +}
> -- 
> 2.35.1
> 
>

Re: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059

Reply via email to