On Tue, 2022-04-12 at 21:14 -0400, Michael Meissner wrote: > Eliminate power8 fusion options, use power8 tuning, PR target/102059 > > This is V4 of the patch. Compared to V3 of the patch, GCC will just > ignore -m{,no-}power8-fusion and -m{,no-}power8-fusion-sign. >
Hi, No comments on code, a few comments about the comments below. > The splitting of signed halfword and word loads into unsigned load and > sign extension is now suppressed with -Os, but it is done normally if we > are not optimizing for space. I see references to TARGET_P8_FUSION_SIGN in the patch below, and some removal of old code. I assume this describes the implementation that remains. > > The power8 fusion support used to be set automatically when -mcpu=power8 or > -mtune=power8 was used, and it was cleared for other cpu's. However, if you > used the target attribute or target #pragma to change the default cpu type or > tuning, you would get an error that a target specifiction option mismatch > occurred. specification. :-) > > This occurred because the rs6000_can_inline_p function just compares the ISA > bits between the called inline function and the caller. If the ISA flags of > the called function is not a subset of the ISA flags of the caller, we won't > do > the inlinging. When a power9 or power10 function inlines a function that is > explicitly compiled for power8, the power8 function has the power8 fusion bits > set and the power9 or power10 functions do not have the fusion bits set. inlining. > > This code makes the -mpower8-fusion option a nop. It is accepted without > warning, but it does nothing. Power8 fusion is only enabled if we are tuning > for a power8. > > The undocumented -mpower8-fusion-sign option is also made into a nop. > > I left in the pragma target and attribute target support for power8-fusion, > but > using it doesn't do anything now. This is because I told the customer who > encountered this problem that one solution was to add an explicit > no-power8-fusion option in their target pragma or attribute to work around the > problem. > > I have tested this patch on a little endian power10 system. I have tested > previous versions on little endian power9 and big endian power8 systems. > Can I apply this patch to the master branch? > > If it is accepted, I will produce a similar patch for back porting to GCC 11 > and GCC 10. > > 2022-04-12 Michael Meissner <meiss...@linux.ibm.com> > > gcc/ > PR target/102059 > * config/rs6000/rs6000-cpus.def (OTHER_FUSION_MASKS): Delete. > (ISA_3_0_MASKS_SERVER): Don't clear the fusion masks. > (POWERPC_MASKS): Remove OPTION_MASK_P8_FUSION. > * config/rs6000/rs6000.cc (rs6000_option_override_internal): > Delete code that set the power8 fusion options automatically. > (rs6000_opt_masks): Allow #pragma target and attribute target > power8-fusion option for backwards compatibility. > (rs6000_print_options_internal): Skip printing backward > compatibility options that are just ignored. > * config/rs6000/rs6000.h (TARGET_P8_FUSION): New macro. > (TARGET_P8_FUSION_SIGN): Likewise. > (MASK_P8_FUSION): Delete. > * config/rs6000/rs6000.opt (-mpower8-fusion): Recognize the option but > ignore it completely. > (-mpower8-fusion-sign): Likewise. > * doc/invoke.texi (RS/6000 and PowerPC Options): Delete > -mpower8-fusion. > > gcc/testsuite/ > PR target/102059 > * gcc.dg/lto/pr102059-1_0.c: Remove -mno-power8-fusion. > * gcc.dg/lto/pr102059-2_0.c: Likewise. > * gcc.target/powerpc/pr102059-3.c: Likewise. > * gcc.target/powerpc/pr102059-4.c: New test. > --- > gcc/config/rs6000/rs6000-cpus.def | 18 +++---- > gcc/config/rs6000/rs6000.cc | 49 +++++-------------- > gcc/config/rs6000/rs6000.h | 13 ++++- > gcc/config/rs6000/rs6000.opt | 8 +-- > gcc/doc/invoke.texi | 13 +---- > gcc/testsuite/gcc.dg/lto/pr102059-1_0.c | 2 +- > gcc/testsuite/gcc.dg/lto/pr102059-2_0.c | 2 +- > gcc/testsuite/gcc.target/powerpc/pr102059-3.c | 2 +- > gcc/testsuite/gcc.target/powerpc/pr102059-4.c | 23 +++++++++ > 9 files changed, 62 insertions(+), 68 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-4.c > > diff --git a/gcc/config/rs6000/rs6000-cpus.def > b/gcc/config/rs6000/rs6000-cpus.def > index 963947f6939..d913a3d6b73 100644 > --- a/gcc/config/rs6000/rs6000-cpus.def > +++ b/gcc/config/rs6000/rs6000-cpus.def > @@ -54,19 +54,14 @@ > | OPTION_MASK_QUAD_MEMORY \ > | OPTION_MASK_QUAD_MEMORY_ATOMIC) > > -/* ISA masks setting fusion options. */ > -#define OTHER_FUSION_MASKS (OPTION_MASK_P8_FUSION \ > - | OPTION_MASK_P8_FUSION_SIGN) > - > /* Add ISEL back into ISA 3.0, since it is supposed to be a win. Do not add > FLOAT128_HW here until we are ready to make -mfloat128 on by default. */ > -#define ISA_3_0_MASKS_SERVER ((ISA_2_7_MASKS_SERVER \ > - | OPTION_MASK_ISEL \ > - | OPTION_MASK_MODULO \ > - | OPTION_MASK_P9_MINMAX \ > - | OPTION_MASK_P9_MISC \ > - | OPTION_MASK_P9_VECTOR) \ > - & ~OTHER_FUSION_MASKS) > +#define ISA_3_0_MASKS_SERVER (ISA_2_7_MASKS_SERVER \ > + | OPTION_MASK_ISEL \ > + | OPTION_MASK_MODULO \ > + | OPTION_MASK_P9_MINMAX \ > + | OPTION_MASK_P9_MISC \ > + | OPTION_MASK_P9_VECTOR) > > /* Support for the IEEE 128-bit floating point hardware requires a lot of the > VSX instructions that are part of ISA 3.0. */ > @@ -140,7 +135,6 @@ > | OPTION_MASK_MODULO \ > | OPTION_MASK_MULHW \ > | OPTION_MASK_NO_UPDATE \ > - | OPTION_MASK_P8_FUSION \ > | OPTION_MASK_P8_VECTOR \ > | OPTION_MASK_P9_MINMAX \ > | OPTION_MASK_P9_MISC \ > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index ceaddafd33b..269c7510e3e 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -4040,41 +4040,6 @@ rs6000_option_override_internal (bool global_init_p) > && optimize_function_for_speed_p (cfun)) > rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT; > > - /* Enable power8 fusion if we are tuning for power8, even if we aren't > - generating power8 instructions. Power9 does not optimize power8 fusion > - cases. */ > - if (!(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION)) > - { > - if (processor_target_table[tune_index].processor == PROCESSOR_POWER8) > - rs6000_isa_flags |= OPTION_MASK_P8_FUSION; > - else > - rs6000_isa_flags &= ~OPTION_MASK_P8_FUSION; > - } > - > - /* Setting additional fusion flags turns on base fusion. */ > - if (!TARGET_P8_FUSION && TARGET_P8_FUSION_SIGN) > - { > - if (rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION) > - { > - if (TARGET_P8_FUSION_SIGN) > - error ("%qs requires %qs", "-mpower8-fusion-sign", > - "-mpower8-fusion"); > - > - rs6000_isa_flags &= ~OPTION_MASK_P8_FUSION; > - } > - else > - rs6000_isa_flags |= OPTION_MASK_P8_FUSION; > - } > - > - /* Power8 does not fuse sign extended loads with the addis. If we are > - optimizing at high levels for speed, convert a sign extended load into a > - zero extending load, and an explicit sign extension. */ > - if (TARGET_P8_FUSION > - && !(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION_SIGN) > - && optimize_function_for_speed_p (cfun) > - && optimize >= 3) > - rs6000_isa_flags |= OPTION_MASK_P8_FUSION_SIGN; > - > /* ISA 3.0 vector instructions include ISA 2.07. */ > if (TARGET_P9_VECTOR && !TARGET_P8_VECTOR) > { > @@ -24010,8 +23975,6 @@ static struct rs6000_opt_mask const > rs6000_opt_masks[] = > { "pcrel-opt", OPTION_MASK_PCREL_OPT, false, true }, > { "popcntb", OPTION_MASK_POPCNTB, false, > true }, > { "popcntd", OPTION_MASK_POPCNTD, false, > true }, > - { "power8-fusion", OPTION_MASK_P8_FUSION, false, true }, > - { "power8-fusion-sign", OPTION_MASK_P8_FUSION_SIGN, false, true }, > { "power8-vector", OPTION_MASK_P8_VECTOR, false, true }, > { "power9-minmax", OPTION_MASK_P9_MINMAX, false, true }, > { "power9-misc", OPTION_MASK_P9_MISC, false, true }, > @@ -24051,6 +24014,13 @@ static struct rs6000_opt_mask const > rs6000_opt_masks[] = > #endif > { "soft-float", OPTION_MASK_SOFT_FLOAT, false, false }, > { "string", 0, false, > false }, > + > + /* The Power8 fusion option was removed. We ignore using it in #pragma and > + attribute target. Users may have used the options to suppress errors if > + they declare an inline function to be specifically power8 and the > function > + was included by power9 or power10 which turned off the power8 fusion > + support. */ "was removed" is a good detail for history, but possibly not required here? Maybe stl "The Power8 fusion option is a no-op". ? > + { "power8-fusion", 0, false, true }, > }; > > /* Builtin mask mapping for printing the flags. */ > @@ -24687,6 +24657,11 @@ rs6000_print_options_internal (FILE *file, > HOST_WIDE_INT mask = opts[i].mask; > size_t len = comma_len + prefix_len + strlen (name); > > + /* Don't print options that exist for backwards compatibility, but are > + ignored now like -mpower8-fusion. */ > + if (!mask) > + continue; I'm not sure calling out the -mpower8-fusion option here is necessary. perhaps s/ignored now like -mpower8-fusion/masked out./ > + > if (!invert) > { > if ((flags & mask) == 0) > diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h > index 523256a5c9d..52a29e1c702 100644 > --- a/gcc/config/rs6000/rs6000.h > +++ b/gcc/config/rs6000/rs6000.h > @@ -504,6 +504,18 @@ extern int rs6000_vector_align[]; > #define TARGET_MINMAX (TARGET_HARD_FLOAT && TARGET_PPC_GFXOPT > \ > && (TARGET_P9_MINMAX || !flag_trapping_math)) > > +/* Power8 has special fusion operations that are enabled if we are tuning for > + power8. This used to be settable with an option (-mpower8-fusion), but > that > + option has been removed. */ s/has been removed/is now ignored/ ? > +#define TARGET_P8_FUSION (rs6000_tune == PROCESSOR_POWER8) > + > +/* Power8 fusion does not fuse loads with sign extends. If we are not > + optimizing for space, split loads with sign extension to loads with zero > + extension and an explicit sign extend operation, so that the zero > extending > + load can be fused. */ > +#define TARGET_P8_FUSION_SIGN (TARGET_P8_FUSION > \ > + && !optimize_function_for_size_p (cfun)) > + > /* In switching from using target_flags to using rs6000_isa_flags, the > options > machinery creates OPTION_MASK_<xxx> instead of MASK_<xxx>. For now map > OPTION_MASK_<xxx> back into MASK_<xxx>. */ > @@ -517,7 +529,6 @@ extern int rs6000_vector_align[]; > #define MASK_FLOAT128_KEYWORD OPTION_MASK_FLOAT128_KEYWORD > #define MASK_FLOAT128_HW OPTION_MASK_FLOAT128_HW > #define MASK_FPRND OPTION_MASK_FPRND > -#define MASK_P8_FUSION OPTION_MASK_P8_FUSION > #define MASK_HARD_FLOAT OPTION_MASK_HARD_FLOAT > #define MASK_HTM OPTION_MASK_HTM > #define MASK_ISEL OPTION_MASK_ISEL > diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt > index 4931d781c4e..6c4caf4c9ee 100644 > --- a/gcc/config/rs6000/rs6000.opt > +++ b/gcc/config/rs6000/rs6000.opt > @@ -474,13 +474,13 @@ Save the TOC in the prologue for indirect calls rather > than inline. > mvsx-timode > Target RejectNegative Undocumented Ignore > > +; The -mpower8-fusion and -mpower8-fusion-sign options existed in the past, > but > +; they are ignored now. If the comment is necessary (it may be redundant with the Ignore attribute in place) I suggest s/existed in the past, but they// > mpower8-fusion > -Target Mask(P8_FUSION) Var(rs6000_isa_flags) > -Fuse certain integer operations together for better performance on power8. > +Target Undocumented Ignore > > mpower8-fusion-sign > -Target Undocumented Mask(P8_FUSION_SIGN) Var(rs6000_isa_flags) > -Allow sign extension in fusion operations. > +Target Undocumented Ignore > > mpower8-vector > Target Mask(P8_VECTOR) Var(rs6000_isa_flags) > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 1a51759e6e4..7c21779e3b2 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -1266,8 +1266,7 @@ See RS/6000 and PowerPC Options. > -mveclibabi=@var{type} -mfriz -mno-friz @gol > -mpointers-to-nested-functions -mno-pointers-to-nested-functions @gol > -msave-toc-indirect -mno-save-toc-indirect @gol > --mpower8-fusion -mno-mpower8-fusion -mpower8-vector -mno-power8-vector > @gol > --mcrypto -mno-crypto -mhtm -mno-htm @gol > +-mpower8-vector -mno-power8-vector -mcrypto -mno-crypto -mhtm -mno-htm > @gol > -mquad-memory -mno-quad-memory @gol > -mquad-memory-atomic -mno-quad-memory-atomic @gol > -mcompat-align-parm -mno-compat-align-parm @gol > @@ -28355,7 +28354,7 @@ following options: > -mpopcntb -mpopcntd -mpowerpc64 @gol > -mpowerpc-gpopt -mpowerpc-gfxopt @gol > -mmulhw -mdlmzb -mmfpgpr -mvsx @gol > --mcrypto -mhtm -mpower8-fusion -mpower8-vector @gol > +-mcrypto -mhtm -mpower8-vector @gol > -mquad-memory -mquad-memory-atomic -mfloat128 @gol > -mfloat128-hardware -mprefixed -mpcrel -mmma @gol > -mrop-protect} > @@ -28470,14 +28469,6 @@ Enable (disable) the use of the built-in functions > that allow direct > access to the Hardware Transactional Memory (HTM) instructions that > were added in version 2.07 of the PowerPC ISA. > > -@item -mpower8-fusion > -@itemx -mno-power8-fusion > -@opindex mpower8-fusion > -@opindex mno-power8-fusion > -Generate code that keeps (does not keeps) some integer operations > -adjacent so that the instructions can be fused together on power8 and > -later processors. > - > @item -mpower8-vector > @itemx -mno-power8-vector > @opindex mpower8-vector > diff --git a/gcc/testsuite/gcc.dg/lto/pr102059-1_0.c > b/gcc/testsuite/gcc.dg/lto/pr102059-1_0.c > index e1004f5cfbf..b215b701097 100644 > --- a/gcc/testsuite/gcc.dg/lto/pr102059-1_0.c > +++ b/gcc/testsuite/gcc.dg/lto/pr102059-1_0.c > @@ -1,7 +1,7 @@ > /* { dg-lto-do link } */ > /* { dg-skip-if "power10 and above only" { ! { power10_ok } } } */ > /* -Wno-attributes suppresses always_inline warnings. */ > -/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -flto -Wno-attributes > -mno-power8-fusion" } } */ > +/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -flto -Wno-attributes" } } */ > > int __attribute__ ((always_inline)) > foo1 (int *b) > diff --git a/gcc/testsuite/gcc.dg/lto/pr102059-2_0.c > b/gcc/testsuite/gcc.dg/lto/pr102059-2_0.c > index ebb0a7b8fa1..ffab23ab7e1 100644 > --- a/gcc/testsuite/gcc.dg/lto/pr102059-2_0.c > +++ b/gcc/testsuite/gcc.dg/lto/pr102059-2_0.c > @@ -1,6 +1,6 @@ > /* { dg-lto-do link } */ > /* { dg-skip-if "power10 and above only" { ! { power10_ok } } } */ > -/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -mno-power8-fusion -flto > -fdump-ipa-inline" } } */ > +/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -flto -fdump-ipa-inline" } } > */ > > int > foo1 (int *b) > diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-3.c > b/gcc/testsuite/gcc.target/powerpc/pr102059-3.c > index 21c982d93f0..0cb3a4cb9f9 100644 > --- a/gcc/testsuite/gcc.target/powerpc/pr102059-3.c > +++ b/gcc/testsuite/gcc.target/powerpc/pr102059-3.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -mdejagnu-cpu=power8 -mno-power8-fusion > -fdump-tree-einline-optimized" } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power8 -fdump-tree-einline-optimized" } */ > > /* Like pr102059-1.c, to verify the inlining still happens > even without always_inline attribute. */ > diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-4.c > b/gcc/testsuite/gcc.target/powerpc/pr102059-4.c > new file mode 100644 > index 00000000000..5fe66f8af4b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr102059-4.c > @@ -0,0 +1,23 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */ > +/* { dg-require-effective-target power10_ok } */ I think for coverage, this test could actually target power9_ok, but sufficient as is. lgtm, thanks -Will > + > +/* Verify that power10 can explicity include functions compiled for power8. > + The issue was -mcpu=power8 enables -mpower8-fusion, but -mcpu=power9 or > + -mcpu=power10 do not set power8-fusion by default. Thus when doing this > + compilation, they would get an error that the inline function failed in > its > + inlining due to having incompatible options. */ > + > +static inline int __attribute__ ((always_inline,target("cpu=power8"))) > +foo (int *b) > +{ > + *b += 10; > + return *b; > +} > + > +int > +bar (int *a) > +{ > + *a = foo (a); > + return 0; > +} > -- > 2.35.1 > >