RE: [PATCH v3] aarch64: Fine-grained policies to control ldp-stp formation.

Kyrylo Tkachov Tue, 26 Sep 2023 01:35:51 -0700


> -----Original Message-----
> From: Andrew Pinski <pins...@gmail.com>
> Sent: Monday, September 25, 2023 9:05 PM
> To: Philipp Tomsich <philipp.toms...@vrull.eu>
> Cc: Manos Anagnostakis <manos.anagnosta...@vrull.eu>; gcc-
> patc...@gcc.gnu.org; Kyrylo Tkachov <kyrylo.tkac...@arm.com>
> Subject: Re: [PATCH v3] aarch64: Fine-grained policies to control ldp-stp
> formation.
> 
> On Mon, Sep 25, 2023 at 12:59 PM Philipp Tomsich
> <philipp.toms...@vrull.eu> wrote:
> >
> > On Mon, 25 Sept 2023 at 21:54, Andrew Pinski <pins...@gmail.com> wrote:
> > >
> > > On Mon, Sep 25, 2023 at 12:50 PM Manos Anagnostakis
> > > <manos.anagnosta...@vrull.eu> wrote:
> > > >
> > > > This patch implements the following TODO in
> gcc/config/aarch64/aarch64.cc
> > > > to provide the requested behaviour for handling ldp and stp:
> > > >
> > > >   /* Allow the tuning structure to disable LDP instruction formation
> > > >      from combining instructions (e.g., in peephole2).
> > > >      TODO: Implement fine-grained tuning control for LDP and STP:
> > > >            1. control policies for load and store separately;
> > > >            2. support the following policies:
> > > >               - default (use what is in the tuning structure)
> > > >               - always
> > > >               - never
> > > >               - aligned (only if the compiler can prove that the
> > > >                 load will be aligned to 2 * element_size)  */
> > > >
> > > > It provides two new and concrete target-specific command-line
> parameters
> > > > -param=aarch64-ldp-policy= and -param=aarch64-stp-policy=
> > > > to give the ability to control load and store policies seperately as
> > > > stated in part 1 of the TODO.
> > > >
> > > > The accepted values for both parameters are:
> > > > - 0: Use the policy of the tuning structure (default).
> > > > - 1: Emit ldp/stp regardless of alignment.
> > > > - 2: Do not emit ldp/stp.
> > > > - 3: In order to emit ldp/stp, first check if the load/store will
> > > >   be aligned to 2 * element_size.
> > >
> > > Instead of a number, does it make sense to instead use an string
> > > (ENUM) for this param.
> > > Also I think using --param is a bad idea if it is going to be
> > > documented in the user manual.
> > > Maybe a -m option should be used instead.
> >
> > See https://gcc.gnu.org/pipermail/gcc-patches/2023-
> September/631283.html
> > for the discussion triggering the change from -m... to --param and the
> > change to using a number instead of a string.
> 
> That is the opposite of the current GCC practice across all targets.
> Things like this should be consistent and if one target decides to do
> it different, then maybe it should NOT.
> Anyways we should document the correct coding style for options so we
> don't have these back and forths again.


My rationale for having this as a param rather than an -m* option is that
this is just an override for a codegen heuristic that the compiler should be
getting correct on its own when used by a normal user.
Having a way to force an explicit LDP/STP policy can be useful for testing
the compiler and for some power user experimentation, but I wouldn't
want to see it make its way into any user makefiles.

Good point on having it accept an enum, it is definitely more readable to have 
a string argument.
Thanks,
Kyrill

> 
> 
> Thanks,
> Andrew
> 
> >
> > Thanks,
> > Philipp.
> >
> > >
> > > Thanks,
> > > Andrew
> > >
> > > >
> > > > gcc/ChangeLog:
> > > >         * config/aarch64/aarch64-protos.h (struct tune_params): Add
> > > >         appropriate enums for the policies.
> > > >         * config/aarch64/aarch64-tuning-flags.def
> > > >         (AARCH64_EXTRA_TUNING_OPTION): Remove superseded tuning
> > > >         options.
> > > >         * config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): New
> > > >         function to parse ldp-policy parameter.
> > > >         (aarch64_parse_stp_policy): New function to parse stp-policy
> parameter.
> > > >         (aarch64_override_options_internal): Call parsing functions.
> > > >         (aarch64_operands_ok_for_ldpstp): Add parameter-value check
> and
> > > >         alignment check and remove superseded ones.
> > > >         (aarch64_operands_adjust_ok_for_ldpstp): Add parameter-value
> check and
> > > >         alignment check and remove superseded ones.
> > > >         * config/aarch64/aarch64.opt: Add options.
> > > >         * doc/invoke.texi: Document the parameters accordingly.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >         * gcc.target/aarch64/ampere1-no_ldp_combine.c: Removed.
> > > >         * gcc.target/aarch64/ldp_aligned.c: New test.
> > > >         * gcc.target/aarch64/ldp_always.c: New test.
> > > >         * gcc.target/aarch64/ldp_never.c: New test.
> > > >         * gcc.target/aarch64/stp_aligned.c: New test.
> > > >         * gcc.target/aarch64/stp_always.c: New test.
> > > >         * gcc.target/aarch64/stp_never.c: New test.
> > > >
> > > > Signed-off-by: Manos Anagnostakis <manos.anagnosta...@vrull.eu>
> > > > ---
> > > > Changes in v3:
> > > >         - Changed command-line options to target-specific parameters
> > > >           and documented them accordingly in doc/invoke.texi.
> > > >         - Removed ampere1-no_ldp_combine.c test as superseded.
> > > >
> > > >  gcc/config/aarch64/aarch64-protos.h           |  24 ++
> > > >  gcc/config/aarch64/aarch64-tuning-flags.def   |   8 -
> > > >  gcc/config/aarch64/aarch64.cc                 | 215 +++++++++++++-----
> > > >  gcc/config/aarch64/aarch64.opt                |   8 +
> > > >  gcc/doc/invoke.texi                           |  30 +++
> > > >  .../aarch64/ampere1-no_ldp_combine.c          |  11 -
> > > >  .../gcc.target/aarch64/ldp_aligned.c          |  66 ++++++
> > > >  gcc/testsuite/gcc.target/aarch64/ldp_always.c |  66 ++++++
> > > >  gcc/testsuite/gcc.target/aarch64/ldp_never.c  |  66 ++++++
> > > >  .../gcc.target/aarch64/stp_aligned.c          |  60 +++++
> > > >  gcc/testsuite/gcc.target/aarch64/stp_always.c |  60 +++++
> > > >  gcc/testsuite/gcc.target/aarch64/stp_never.c  |  60 +++++
> > > >  12 files changed, 600 insertions(+), 74 deletions(-)
> > > >  delete mode 100644 gcc/testsuite/gcc.target/aarch64/ampere1-
> no_ldp_combine.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_always.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_never.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_aligned.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_always.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_never.c
> > > >
> > > > diff --git a/gcc/config/aarch64/aarch64-protos.h
> b/gcc/config/aarch64/aarch64-protos.h
> > > > index 70303d6fd95..be1d73490ed 100644
> > > > --- a/gcc/config/aarch64/aarch64-protos.h
> > > > +++ b/gcc/config/aarch64/aarch64-protos.h
> > > > @@ -568,6 +568,30 @@ struct tune_params
> > > >    /* Place prefetch struct pointer at the end to enable type checking
> > > >       errors when tune_params misses elements (e.g., from erroneous
> merges).  */
> > > >    const struct cpu_prefetch_tune *prefetch;
> > > > +/* An enum specifying how to handle load pairs using a fine-grained
> policy:
> > > > +   - LDP_POLICY_ALIGNED: Emit ldp if the source pointer is aligned
> > > > +   to at least double the alignment of the type.
> > > > +   - LDP_POLICY_ALWAYS: Emit ldp regardless of alignment.
> > > > +   - LDP_POLICY_NEVER: Do not emit ldp.  */
> > > > +
> > > > +  enum aarch64_ldp_policy_model
> > > > +  {
> > > > +    LDP_POLICY_ALIGNED,
> > > > +    LDP_POLICY_ALWAYS,
> > > > +    LDP_POLICY_NEVER
> > > > +  } ldp_policy_model;
> > > > +/* An enum specifying how to handle store pairs using a fine-grained
> policy:
> > > > +   - STP_POLICY_ALIGNED: Emit stp if the source pointer is aligned
> > > > +   to at least double the alignment of the type.
> > > > +   - STP_POLICY_ALWAYS: Emit stp regardless of alignment.
> > > > +   - STP_POLICY_NEVER: Do not emit stp.  */
> > > > +
> > > > +  enum aarch64_stp_policy_model
> > > > +  {
> > > > +    STP_POLICY_ALIGNED,
> > > > +    STP_POLICY_ALWAYS,
> > > > +    STP_POLICY_NEVER
> > > > +  } stp_policy_model;
> > > >  };
> > > >
> > > >  /* Classifies an address.
> > > > diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def
> b/gcc/config/aarch64/aarch64-tuning-flags.def
> > > > index 52112ba7c48..774568e9106 100644
> > > > --- a/gcc/config/aarch64/aarch64-tuning-flags.def
> > > > +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
> > > > @@ -30,11 +30,6 @@
> > > >
> > > >  AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs",
> RENAME_FMA_REGS)
> > > >
> > > > -/* Don't create non-8 byte aligned load/store pair.  That is if the
> > > > -two load/stores are not at least 8 byte aligned don't create load/store
> > > > -pairs.   */
> > > > -AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw",
> SLOW_UNALIGNED_LDPW)
> > > > -
> > > >  /* Some of the optional shift to some arthematic instructions are
> > > >     considered cheap.  Logical shift left <=4 with or without a
> > > >     zero extend are considered cheap.  Sign extend; non logical shift 
> > > > left
> > > > @@ -44,9 +39,6 @@ AARCH64_EXTRA_TUNING_OPTION
> ("cheap_shift_extend", CHEAP_SHIFT_EXTEND)
> > > >  /* Disallow load/store pair instructions on Q-registers.  */
> > > >  AARCH64_EXTRA_TUNING_OPTION ("no_ldp_stp_qregs",
> NO_LDP_STP_QREGS)
> > > >
> > > > -/* Disallow load-pair instructions to be formed in combine/peephole.
> */
> > > > -AARCH64_EXTRA_TUNING_OPTION ("no_ldp_combine",
> NO_LDP_COMBINE)
> > > > -
> > > >  AARCH64_EXTRA_TUNING_OPTION ("rename_load_regs",
> RENAME_LOAD_REGS)
> > > >
> > > >  AARCH64_EXTRA_TUNING_OPTION ("cse_sve_vl_constants",
> CSE_SVE_VL_CONSTANTS)
> > > > diff --git a/gcc/config/aarch64/aarch64.cc
> b/gcc/config/aarch64/aarch64.cc
> > > > index 219c4ee6d4c..9eeb5469cf9 100644
> > > > --- a/gcc/config/aarch64/aarch64.cc
> > > > +++ b/gcc/config/aarch64/aarch64.cc
> > > > @@ -1357,7 +1357,9 @@ static const struct tune_params
> generic_tunings =
> > > >       Neoverse V1.  It does not have a noticeable effect on A64FX and
> should
> > > >       have at most a very minor effect on SVE2 cores.  */
> > > >    (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS),   /* tune_flags.
> */
> > > > -  &generic_prefetch_tune
> > > > +  &generic_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALWAYS,      /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALWAYS       /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const struct tune_params cortexa35_tunings =
> > > > @@ -1391,7 +1393,9 @@ static const struct tune_params
> cortexa35_tunings =
> > > >    0,   /* max_case_values.  */
> > > >    tune_params::AUTOPREFETCHER_WEAK,    /* autoprefetcher_model.
> */
> > > >    (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
> > > > -  &generic_prefetch_tune
> > > > +  &generic_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALWAYS,      /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALWAYS       /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const struct tune_params cortexa53_tunings =
> > > > @@ -1425,7 +1429,9 @@ static const struct tune_params
> cortexa53_tunings =
> > > >    0,   /* max_case_values.  */
> > > >    tune_params::AUTOPREFETCHER_WEAK,    /* autoprefetcher_model.
> */
> > > >    (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
> > > > -  &generic_prefetch_tune
> > > > +  &generic_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALWAYS,      /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALWAYS       /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const struct tune_params cortexa57_tunings =
> > > > @@ -1459,7 +1465,9 @@ static const struct tune_params
> cortexa57_tunings =
> > > >    0,   /* max_case_values.  */
> > > >    tune_params::AUTOPREFETCHER_WEAK,    /* autoprefetcher_model.
> */
> > > >    (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS),        /* tune_flags.  */
> > > > -  &generic_prefetch_tune
> > > > +  &generic_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALWAYS,      /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALWAYS       /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const struct tune_params cortexa72_tunings =
> > > > @@ -1493,7 +1501,9 @@ static const struct tune_params
> cortexa72_tunings =
> > > >    0,   /* max_case_values.  */
> > > >    tune_params::AUTOPREFETCHER_WEAK,    /* autoprefetcher_model.
> */
> > > >    (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
> > > > -  &generic_prefetch_tune
> > > > +  &generic_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALWAYS,      /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALWAYS       /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const struct tune_params cortexa73_tunings =
> > > > @@ -1527,7 +1537,9 @@ static const struct tune_params
> cortexa73_tunings =
> > > >    0,   /* max_case_values.  */
> > > >    tune_params::AUTOPREFETCHER_WEAK,    /* autoprefetcher_model.
> */
> > > >    (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
> > > > -  &generic_prefetch_tune
> > > > +  &generic_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALWAYS,      /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALWAYS       /* stp_policy_model.  */
> > > >  };
> > > >
> > > >
> > > > @@ -1562,7 +1574,9 @@ static const struct tune_params
> exynosm1_tunings =
> > > >    48,  /* max_case_values.  */
> > > >    tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model.  */
> > > >    (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
> > > > -  &exynosm1_prefetch_tune
> > > > +  &exynosm1_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALWAYS,      /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALWAYS       /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const struct tune_params thunderxt88_tunings =
> > > > @@ -1594,8 +1608,10 @@ static const struct tune_params
> thunderxt88_tunings =
> > > >    2,   /* min_div_recip_mul_df.  */
> > > >    0,   /* max_case_values.  */
> > > >    tune_params::AUTOPREFETCHER_OFF,     /* autoprefetcher_model.  */
> > > > -  (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW),    /* tune_flags.
> */
> > > > -  &thunderxt88_prefetch_tune
> > > > +  (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
> > > > +  &thunderxt88_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALIGNED,     /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALIGNED      /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const struct tune_params thunderx_tunings =
> > > > @@ -1627,9 +1643,10 @@ static const struct tune_params
> thunderx_tunings =
> > > >    2,   /* min_div_recip_mul_df.  */
> > > >    0,   /* max_case_values.  */
> > > >    tune_params::AUTOPREFETCHER_OFF,     /* autoprefetcher_model.  */
> > > > -  (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW
> > > > -   | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),   /* tune_flags.  */
> > > > -  &thunderx_prefetch_tune
> > > > +  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),     /* tune_flags.  */
> > > > +  &thunderx_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALIGNED,     /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALIGNED      /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const struct tune_params tsv110_tunings =
> > > > @@ -1663,7 +1680,9 @@ static const struct tune_params
> tsv110_tunings =
> > > >    0,    /* max_case_values.  */
> > > >    tune_params::AUTOPREFETCHER_WEAK,     /* autoprefetcher_model.
> */
> > > >    (AARCH64_EXTRA_TUNE_NONE),     /* tune_flags.  */
> > > > -  &tsv110_prefetch_tune
> > > > +  &tsv110_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALWAYS,      /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALWAYS       /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const struct tune_params xgene1_tunings =
> > > > @@ -1696,7 +1715,9 @@ static const struct tune_params
> xgene1_tunings =
> > > >    17,  /* max_case_values.  */
> > > >    tune_params::AUTOPREFETCHER_OFF,     /* autoprefetcher_model.  */
> > > >    (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),       /* tune_flags.  */
> > > > -  &xgene1_prefetch_tune
> > > > +  &xgene1_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALWAYS,      /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALWAYS       /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const struct tune_params emag_tunings =
> > > > @@ -1729,7 +1750,9 @@ static const struct tune_params emag_tunings
> =
> > > >    17,  /* max_case_values.  */
> > > >    tune_params::AUTOPREFETCHER_OFF,     /* autoprefetcher_model.  */
> > > >    (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),       /* tune_flags.  */
> > > > -  &xgene1_prefetch_tune
> > > > +  &xgene1_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALWAYS,      /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALWAYS       /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const struct tune_params qdf24xx_tunings =
> > > > @@ -1763,7 +1786,9 @@ static const struct tune_params
> qdf24xx_tunings =
> > > >    0,   /* max_case_values.  */
> > > >    tune_params::AUTOPREFETCHER_WEAK,    /* autoprefetcher_model.
> */
> > > >    AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS, /* tune_flags.  */
> > > > -  &qdf24xx_prefetch_tune
> > > > +  &qdf24xx_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALWAYS,      /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALWAYS       /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  /* Tuning structure for the Qualcomm Saphira core.  Default to falkor
> values
> > > > @@ -1799,7 +1824,9 @@ static const struct tune_params
> saphira_tunings =
> > > >    0,   /* max_case_values.  */
> > > >    tune_params::AUTOPREFETCHER_WEAK,    /* autoprefetcher_model.
> */
> > > >    (AARCH64_EXTRA_TUNE_NONE),           /* tune_flags.  */
> > > > -  &generic_prefetch_tune
> > > > +  &generic_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALWAYS,      /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALWAYS       /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const struct tune_params thunderx2t99_tunings =
> > > > @@ -1833,7 +1860,9 @@ static const struct tune_params
> thunderx2t99_tunings =
> > > >    0,   /* max_case_values.  */
> > > >    tune_params::AUTOPREFETCHER_WEAK,    /* autoprefetcher_model.
> */
> > > >    (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
> > > > -  &thunderx2t99_prefetch_tune
> > > > +  &thunderx2t99_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALWAYS,      /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALWAYS       /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const struct tune_params thunderx3t110_tunings =
> > > > @@ -1867,7 +1896,9 @@ static const struct tune_params
> thunderx3t110_tunings =
> > > >    0,   /* max_case_values.  */
> > > >    tune_params::AUTOPREFETCHER_WEAK,    /* autoprefetcher_model.
> */
> > > >    (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
> > > > -  &thunderx3t110_prefetch_tune
> > > > +  &thunderx3t110_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALWAYS,      /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALWAYS       /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const struct tune_params neoversen1_tunings =
> > > > @@ -1900,7 +1931,9 @@ static const struct tune_params
> neoversen1_tunings =
> > > >    0,   /* max_case_values.  */
> > > >    tune_params::AUTOPREFETCHER_WEAK,    /* autoprefetcher_model.
> */
> > > >    (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),     /* tune_flags.  */
> > > > -  &generic_prefetch_tune
> > > > +  &generic_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALWAYS,      /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALWAYS       /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const struct tune_params ampere1_tunings =
> > > > @@ -1936,8 +1969,10 @@ static const struct tune_params
> ampere1_tunings =
> > > >    2,   /* min_div_recip_mul_df.  */
> > > >    0,   /* max_case_values.  */
> > > >    tune_params::AUTOPREFETCHER_WEAK,    /* autoprefetcher_model.
> */
> > > > -  (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE), /* tune_flags.  */
> > > > -  &ampere1_prefetch_tune
> > > > +  (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
> > > > +  &ampere1_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALIGNED,     /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALIGNED      /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const struct tune_params ampere1a_tunings =
> > > > @@ -1974,8 +2009,10 @@ static const struct tune_params
> ampere1a_tunings =
> > > >    2,   /* min_div_recip_mul_df.  */
> > > >    0,   /* max_case_values.  */
> > > >    tune_params::AUTOPREFETCHER_WEAK,    /* autoprefetcher_model.
> */
> > > > -  (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE), /* tune_flags.  */
> > > > -  &ampere1_prefetch_tune
> > > > +  (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
> > > > +  &ampere1_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALIGNED,     /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALIGNED      /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const advsimd_vec_cost neoversev1_advsimd_vector_cost =
> > > > @@ -2156,7 +2193,9 @@ static const struct tune_params
> neoversev1_tunings =
> > > >     | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> > > >     | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
> > > >     | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),   /* tune_flags.  */
> > > > -  &generic_prefetch_tune
> > > > +  &generic_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALWAYS,      /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALWAYS       /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const sve_vec_cost neoverse512tvb_sve_vector_cost =
> > > > @@ -2293,7 +2332,9 @@ static const struct tune_params
> neoverse512tvb_tunings =
> > > >    (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> > > >     | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> > > >     | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),    /*
> tune_flags.  */
> > > > -  &generic_prefetch_tune
> > > > +  &generic_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALWAYS,      /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALWAYS       /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const advsimd_vec_cost neoversen2_advsimd_vector_cost =
> > > > @@ -2483,7 +2524,9 @@ static const struct tune_params
> neoversen2_tunings =
> > > >     | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> > > >     | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> > > >     | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),    /*
> tune_flags.  */
> > > > -  &generic_prefetch_tune
> > > > +  &generic_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALWAYS,      /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALWAYS       /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const advsimd_vec_cost neoversev2_advsimd_vector_cost =
> > > > @@ -2673,7 +2716,9 @@ static const struct tune_params
> neoversev2_tunings =
> > > >     | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> > > >     | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> > > >     | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),    /*
> tune_flags.  */
> > > > -  &generic_prefetch_tune
> > > > +  &generic_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALWAYS,      /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALWAYS       /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  static const struct tune_params a64fx_tunings =
> > > > @@ -2706,7 +2751,9 @@ static const struct tune_params a64fx_tunings
> =
> > > >    0,   /* max_case_values.  */
> > > >    tune_params::AUTOPREFETCHER_WEAK,    /* autoprefetcher_model.
> */
> > > >    (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
> > > > -  &a64fx_prefetch_tune
> > > > +  &a64fx_prefetch_tune,
> > > > +  tune_params::LDP_POLICY_ALWAYS,      /* ldp_policy_model.  */
> > > > +  tune_params::STP_POLICY_ALWAYS       /* stp_policy_model.  */
> > > >  };
> > > >
> > > >  /* Support for fine-grained override of the tuning structures.  */
> > > > @@ -17819,6 +17866,34 @@ aarch64_parse_tune (const char
> *to_parse, const struct processor **res)
> > > >    return AARCH_PARSE_INVALID_ARG;
> > > >  }
> > > >
> > > > +/* Parse a command-line -param=aarch64-ldp-policy= parameter.
> VALUE is
> > > > +   the value of the parameter.  */
> > > > +
> > > > +static void
> > > > +aarch64_parse_ldp_policy (const unsigned int value, struct
> tune_params* tune)
> > > > +{
> > > > +  if (value == 1)
> > > > +    tune->ldp_policy_model = tune_params::LDP_POLICY_ALWAYS;
> > > > +  else if (value == 2)
> > > > +    tune->ldp_policy_model = tune_params::LDP_POLICY_NEVER;
> > > > +  else if (value == 3)
> > > > +    tune->ldp_policy_model = tune_params::LDP_POLICY_ALIGNED;
> > > > +}
> > > > +
> > > > +/* Parse a command-line -param=aarch64-stp-policy= parameter.
> VALUE is
> > > > +   the value of the parameter.  */
> > > > +
> > > > +static void
> > > > +aarch64_parse_stp_policy (const unsigned int value, struct
> tune_params* tune)
> > > > +{
> > > > +  if (value == 1)
> > > > +    tune->stp_policy_model = tune_params::STP_POLICY_ALWAYS;
> > > > +  else if (value == 2)
> > > > +    tune->stp_policy_model = tune_params::STP_POLICY_NEVER;
> > > > +  else if (value == 3)
> > > > +    tune->stp_policy_model = tune_params::STP_POLICY_ALIGNED;
> > > > +}
> > > > +
> > > >  /* Parse TOKEN, which has length LENGTH to see if it is an option
> > > >     described in FLAG.  If it is, return the index bit for that fusion 
> > > > type.
> > > >     If not, error (printing OPTION_NAME) and return zero.  */
> > > > @@ -18167,6 +18242,12 @@ aarch64_override_options_internal (struct
> gcc_options *opts)
> > > >      aarch64_parse_override_string (opts-
> >x_aarch64_override_tune_string,
> > > >                                    &aarch64_tune_params);
> > > >
> > > > +  aarch64_parse_ldp_policy (aarch64_ldp_policy,
> > > > +                             &aarch64_tune_params);
> > > > +
> > > > +  aarch64_parse_stp_policy (aarch64_stp_policy,
> > > > +                             &aarch64_tune_params);
> > > > +
> > > >    /* This target defaults to strict volatile bitfields.  */
> > > >    if (opts->x_flag_strict_volatile_bitfields < 0 && 
> > > > abi_version_at_least
> (2))
> > > >      opts->x_flag_strict_volatile_bitfields = 1;
> > > > @@ -26468,18 +26549,14 @@ aarch64_operands_ok_for_ldpstp (rtx
> *operands, bool load,
> > > >    enum reg_class rclass_1, rclass_2;
> > > >    rtx mem_1, mem_2, reg_1, reg_2;
> > > >
> > > > -  /* Allow the tuning structure to disable LDP instruction formation
> > > > -     from combining instructions (e.g., in peephole2).
> > > > -     TODO: Implement fine-grained tuning control for LDP and STP:
> > > > -          1. control policies for load and store separately;
> > > > -          2. support the following policies:
> > > > -             - default (use what is in the tuning structure)
> > > > -             - always
> > > > -             - never
> > > > -             - aligned (only if the compiler can prove that the
> > > > -               load will be aligned to 2 * element_size)  */
> > > > -  if (load && (aarch64_tune_params.extra_tuning_flags
> > > > -              & AARCH64_EXTRA_TUNE_NO_LDP_COMBINE))
> > > > +  /* If we have LDP_POLICY_NEVER, reject the load pair.  */
> > > > +  if (load
> > > > +      && aarch64_tune_params.ldp_policy_model ==
> tune_params::LDP_POLICY_NEVER)
> > > > +    return false;
> > > > +
> > > > +  /* If we have STP_POLICY_NEVER, reject the store pair.  */
> > > > +  if (!load
> > > > +      && aarch64_tune_params.stp_policy_model ==
> tune_params::STP_POLICY_NEVER)
> > > >      return false;
> > > >
> > > >    if (load)
> > > > @@ -26506,13 +26583,22 @@ aarch64_operands_ok_for_ldpstp (rtx
> *operands, bool load,
> > > >    if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2))
> > > >      return false;
> > > >
> > > > -  /* If we have SImode and slow unaligned ldp,
> > > > -     check the alignment to be at least 8 byte. */
> > > > -  if (mode == SImode
> > > > -      && (aarch64_tune_params.extra_tuning_flags
> > > > -          & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW)
> > > > -      && !optimize_size
> > > > -      && MEM_ALIGN (mem_1) < 8 * BITS_PER_UNIT)
> > > > +  /* If we have LDP_POLICY_ALIGNED,
> > > > +     do not emit the load pair unless the alignment is checked to be
> > > > +     at least double the alignment of the type.  */
> > > > +  if (load
> > > > +      && aarch64_tune_params.ldp_policy_model ==
> tune_params::LDP_POLICY_ALIGNED
> > > > +      && !optimize_function_for_size_p (cfun)
> > > > +      && MEM_ALIGN (mem_1) < 2 * GET_MODE_ALIGNMENT (mode))
> > > > +    return false;
> > > > +
> > > > +  /* If we have STP_POLICY_ALIGNED,
> > > > +     do not emit the store pair unless the alignment is checked to be
> > > > +     at least double the alignment of the type.  */
> > > > +  if (!load
> > > > +      && aarch64_tune_params.stp_policy_model ==
> tune_params::STP_POLICY_ALIGNED
> > > > +      && !optimize_function_for_size_p (cfun)
> > > > +      && MEM_ALIGN (mem_1) < 2 * GET_MODE_ALIGNMENT (mode))
> > > >      return false;
> > > >
> > > >    /* Check if the addresses are in the form of [base+offset].  */
> > > > @@ -26640,6 +26726,16 @@ aarch64_operands_adjust_ok_for_ldpstp
> (rtx *operands, bool load,
> > > >    HOST_WIDE_INT offvals[num_insns], msize;
> > > >    rtx mem[num_insns], reg[num_insns], base[num_insns],
> offset[num_insns];
> > > >
> > > > +  /* If we have LDP_POLICY_NEVER, reject the load pair.  */
> > > > +  if (load
> > > > +      && aarch64_tune_params.ldp_policy_model ==
> tune_params::LDP_POLICY_NEVER)
> > > > +    return false;
> > > > +
> > > > +  /* If we have STP_POLICY_NEVER, reject the store pair.  */
> > > > +  if (!load
> > > > +      && aarch64_tune_params.stp_policy_model ==
> tune_params::STP_POLICY_NEVER)
> > > > +    return false;
> > > > +
> > > >    if (load)
> > > >      {
> > > >        for (int i = 0; i < num_insns; i++)
> > > > @@ -26729,13 +26825,22 @@ aarch64_operands_adjust_ok_for_ldpstp
> (rtx *operands, bool load,
> > > >    if (offvals[0] % msize != offvals[2] % msize)
> > > >      return false;
> > > >
> > > > -  /* If we have SImode and slow unaligned ldp,
> > > > -     check the alignment to be at least 8 byte. */
> > > > -  if (mode == SImode
> > > > -      && (aarch64_tune_params.extra_tuning_flags
> > > > -         & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW)
> > > > -      && !optimize_size
> > > > -      && MEM_ALIGN (mem[0]) < 8 * BITS_PER_UNIT)
> > > > +  /* If we have LDP_POLICY_ALIGNED,
> > > > +     do not emit the load pair unless the alignment is checked to be
> > > > +     at least double the alignment of the type.  */
> > > > +  if (load
> > > > +      && aarch64_tune_params.ldp_policy_model ==
> tune_params::LDP_POLICY_ALIGNED
> > > > +      && !optimize_function_for_size_p (cfun)
> > > > +      && MEM_ALIGN (mem[0]) < 2 * GET_MODE_ALIGNMENT (mode))
> > > > +    return false;
> > > > +
> > > > +  /* If we have STP_POLICY_ALIGNED,
> > > > +     do not emit the store pair unless the alignment is checked to be
> > > > +     at least double the alignment of the type.  */
> > > > +  if (!load
> > > > +      && aarch64_tune_params.stp_policy_model ==
> tune_params::STP_POLICY_ALIGNED
> > > > +      && !optimize_function_for_size_p (cfun)
> > > > +      && MEM_ALIGN (mem[0]) < 2 * GET_MODE_ALIGNMENT (mode))
> > > >      return false;
> > > >
> > > >    return true;
> > > > diff --git a/gcc/config/aarch64/aarch64.opt
> b/gcc/config/aarch64/aarch64.opt
> > > > index 4a0580435a8..f61e3f968d4 100644
> > > > --- a/gcc/config/aarch64/aarch64.opt
> > > > +++ b/gcc/config/aarch64/aarch64.opt
> > > > @@ -337,3 +337,11 @@ Constant memset size in bytes from which to
> start using MOPS sequence.
> > > >  -param=aarch64-vect-unroll-limit=
> > > >  Target Joined UInteger Var(aarch64_vect_unroll_limit) Init(4) Param
> > > >  Limit how much the autovectorizer may unroll a loop.
> > > > +
> > > > +-param=aarch64-ldp-policy=
> > > > +Target Joined UInteger Var(aarch64_ldp_policy) Init(0) IntegerRange(0,
> 3) Param
> > > > +Fine-grained policy for load pairs.
> > > > +
> > > > +-param=aarch64-stp-policy=
> > > > +Target Joined UInteger Var(aarch64_stp_policy) Init(0) IntegerRange(0,
> 3) Param
> > > > +Fine-grained policy for store pairs.
> > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > > index 146b40414b0..000dd5541f4 100644
> > > > --- a/gcc/doc/invoke.texi
> > > > +++ b/gcc/doc/invoke.texi
> > > > @@ -16508,6 +16508,36 @@ Use both Advanced SIMD and SVE.  Prefer
> SVE when the costs are deemed equal.
> > > >  @end table
> > > >  The default value is 0.
> > > >
> > > > +@item aarch64-ldp-policy
> > > > +Fine-grained policy for load pairs.  Accepts values from 0 to 3, 
> > > > inclusive.
> > > > +@table @samp
> > > > +@item 0
> > > > +Use the policy of the tuning structure.
> > > > +@item 1
> > > > +Emit ldp regardless of alignment.
> > > > +@item 2
> > > > +Do not emit ldp.
> > > > +@item 3
> > > > +Emit ldp only if the source pointer is aligned to at least double the
> alignment
> > > > +of the type.
> > > > +@end table
> > > > +The default value is 0.
> > > > +
> > > > +@item aarch64-stp-policy
> > > > +Fine-grained policy for store pairs.  Accepts values from 0 to 3,
> inclusive.
> > > > +@table @samp
> > > > +@item 0
> > > > +Use the policy of the tuning structure.
> > > > +@item 1
> > > > +Emit stp regardless of alignment.
> > > > +@item 2
> > > > +Do not emit stp.
> > > > +@item 3
> > > > +Emit stp only if the source pointer is aligned to at least double the
> alignment
> > > > +of the type.
> > > > +@end table
> > > > +The default value is 0.
> > > > +
> > > >  @item aarch64-loop-vect-issue-rate-niters
> > > >  The tuning for some AArch64 CPUs tries to take both latencies and issue
> > > >  rates into account when deciding whether a loop should be vectorized
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/ampere1-no_ldp_combine.c
> b/gcc/testsuite/gcc.target/aarch64/ampere1-no_ldp_combine.c
> > > > deleted file mode 100644
> > > > index bc871f4481d..00000000000
> > > > --- a/gcc/testsuite/gcc.target/aarch64/ampere1-no_ldp_combine.c
> > > > +++ /dev/null
> > > > @@ -1,11 +0,0 @@
> > > > -/* { dg-options "-O3 -mtune=ampere1" } */
> > > > -
> > > > -long
> > > > -foo (long a[])
> > > > -{
> > > > -  return a[0] + a[1];
> > > > -}
> > > > -
> > > > -/* We should see two ldrs instead of one ldp. */
> > > > -/* { dg-final { scan-assembler {\tldr\t} } } */
> > > > -/* { dg-final { scan-assembler-not {\tldp\t} } } */
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
> b/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
> > > > new file mode 100644
> > > > index 00000000000..8e43faab70d
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
> > > > @@ -0,0 +1,66 @@
> > > > +/* { dg-options "-O2 --param=aarch64-ldp-policy=3 -mcpu=generic" } */
> > > > +
> > > > +#include <stdlib.h>
> > > > +#include <stdint.h>
> > > > +
> > > > +typedef int v4si __attribute__ ((vector_size (16)));
> > > > +
> > > > +#define LDP_TEST_ALIGNED(TYPE) \
> > > > +TYPE ldp_aligned_##TYPE(char* ptr){ \
> > > > +    TYPE a_0, a_1; \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    a_0 = arr[0]; \
> > > > +    a_1 = arr[1]; \
> > > > +    return a_0 + a_1; \
> > > > +}
> > > > +
> > > > +#define LDP_TEST_UNALIGNED(TYPE) \
> > > > +TYPE ldp_unaligned_##TYPE(char* ptr){ \
> > > > +    TYPE a_0, a_1; \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    TYPE *a = arr+1; \
> > > > +    a_0 = a[0]; \
> > > > +    a_1 = a[1]; \
> > > > +    return a_0 + a_1; \
> > > > +}
> > > > +
> > > > +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \
> > > > +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \
> > > > +    TYPE a_0, a_1, a_2, a_3, a_4; \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    a_0 = arr[100]; \
> > > > +    a_1 = arr[101]; \
> > > > +    a_2 = arr[102]; \
> > > > +    a_3 = arr[103]; \
> > > > +    a_4 = arr[110]; \
> > > > +    return a_0 + a_1 + a_2 + a_3 + a_4; \
> > > > +}
> > > > +
> > > > +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \
> > > > +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \
> > > > +    TYPE a_0, a_1, a_2, a_3, a_4; \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    TYPE *a = arr+1; \
> > > > +    a_0 = a[100]; \
> > > > +    a_1 = a[101]; \
> > > > +    a_2 = a[102]; \
> > > > +    a_3 = a[103]; \
> > > > +    a_4 = a[110]; \
> > > > +    return a_0 + a_1 + a_2 + a_3 + a_4; \
> > > > +}
> > > > +
> > > > +LDP_TEST_ALIGNED(int32_t);
> > > > +LDP_TEST_ALIGNED(int64_t);
> > > > +LDP_TEST_ALIGNED(v4si);
> > > > +LDP_TEST_UNALIGNED(int32_t);
> > > > +LDP_TEST_UNALIGNED(int64_t);
> > > > +LDP_TEST_UNALIGNED(v4si);
> > > > +LDP_TEST_ADJUST_ALIGNED(int32_t);
> > > > +LDP_TEST_ADJUST_ALIGNED(int64_t);
> > > > +LDP_TEST_ADJUST_UNALIGNED(int32_t);
> > > > +LDP_TEST_ADJUST_UNALIGNED(int64_t);
> > > > +
> > > > +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 3 } } 
> > > > */
> > > > +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 3 } } 
> > > > */
> > > > +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 1 } } 
> > > > */
> > > > +
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_always.c
> b/gcc/testsuite/gcc.target/aarch64/ldp_always.c
> > > > new file mode 100644
> > > > index 00000000000..532ca607565
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/ldp_always.c
> > > > @@ -0,0 +1,66 @@
> > > > +/* { dg-options "-O2 --param=aarch64-ldp-policy=1 -mcpu=generic" } */
> > > > +
> > > > +#include <stdlib.h>
> > > > +#include <stdint.h>
> > > > +
> > > > +typedef int v4si __attribute__ ((vector_size (16)));
> > > > +
> > > > +#define LDP_TEST_ALIGNED(TYPE) \
> > > > +TYPE ldp_aligned_##TYPE(char* ptr){ \
> > > > +    TYPE a_0, a_1; \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    a_0 = arr[0]; \
> > > > +    a_1 = arr[1]; \
> > > > +    return a_0 + a_1; \
> > > > +}
> > > > +
> > > > +#define LDP_TEST_UNALIGNED(TYPE) \
> > > > +TYPE ldp_unaligned_##TYPE(char* ptr){ \
> > > > +    TYPE a_0, a_1; \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    TYPE *a = arr+1; \
> > > > +    a_0 = a[0]; \
> > > > +    a_1 = a[1]; \
> > > > +    return a_0 + a_1; \
> > > > +}
> > > > +
> > > > +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \
> > > > +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \
> > > > +    TYPE a_0, a_1, a_2, a_3, a_4; \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    a_0 = arr[100]; \
> > > > +    a_1 = arr[101]; \
> > > > +    a_2 = arr[102]; \
> > > > +    a_3 = arr[103]; \
> > > > +    a_4 = arr[110]; \
> > > > +    return a_0 + a_1 + a_2 + a_3 + a_4; \
> > > > +}
> > > > +
> > > > +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \
> > > > +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \
> > > > +    TYPE a_0, a_1, a_2, a_3, a_4; \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    TYPE *a = arr+1; \
> > > > +    a_0 = a[100]; \
> > > > +    a_1 = a[101]; \
> > > > +    a_2 = a[102]; \
> > > > +    a_3 = a[103]; \
> > > > +    a_4 = a[110]; \
> > > > +    return a_0 + a_1 + a_2 + a_3 + a_4; \
> > > > +}
> > > > +
> > > > +LDP_TEST_ALIGNED(int32_t);
> > > > +LDP_TEST_ALIGNED(int64_t);
> > > > +LDP_TEST_ALIGNED(v4si);
> > > > +LDP_TEST_UNALIGNED(int32_t);
> > > > +LDP_TEST_UNALIGNED(int64_t);
> > > > +LDP_TEST_UNALIGNED(v4si);
> > > > +LDP_TEST_ADJUST_ALIGNED(int32_t);
> > > > +LDP_TEST_ADJUST_ALIGNED(int64_t);
> > > > +LDP_TEST_ADJUST_UNALIGNED(int32_t);
> > > > +LDP_TEST_ADJUST_UNALIGNED(int64_t);
> > > > +
> > > > +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 6 } } 
> > > > */
> > > > +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 6 } } 
> > > > */
> > > > +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 2 } } 
> > > > */
> > > > +
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_never.c
> b/gcc/testsuite/gcc.target/aarch64/ldp_never.c
> > > > new file mode 100644
> > > > index 00000000000..b39941c18d7
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/ldp_never.c
> > > > @@ -0,0 +1,66 @@
> > > > +/* { dg-options "-O2 --param=aarch64-ldp-policy=2 -mcpu=generic" } */
> > > > +
> > > > +#include <stdlib.h>
> > > > +#include <stdint.h>
> > > > +
> > > > +typedef int v4si __attribute__ ((vector_size (16)));
> > > > +
> > > > +#define LDP_TEST_ALIGNED(TYPE) \
> > > > +TYPE ldp_aligned_##TYPE(char* ptr){ \
> > > > +    TYPE a_0, a_1; \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    a_0 = arr[0]; \
> > > > +    a_1 = arr[1]; \
> > > > +    return a_0 + a_1; \
> > > > +}
> > > > +
> > > > +#define LDP_TEST_UNALIGNED(TYPE) \
> > > > +TYPE ldp_unaligned_##TYPE(char* ptr){ \
> > > > +    TYPE a_0, a_1; \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    TYPE *a = arr+1; \
> > > > +    a_0 = a[0]; \
> > > > +    a_1 = a[1]; \
> > > > +    return a_0 + a_1; \
> > > > +}
> > > > +
> > > > +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \
> > > > +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \
> > > > +    TYPE a_0, a_1, a_2, a_3, a_4; \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    a_0 = arr[100]; \
> > > > +    a_1 = arr[101]; \
> > > > +    a_2 = arr[102]; \
> > > > +    a_3 = arr[103]; \
> > > > +    a_4 = arr[110]; \
> > > > +    return a_0 + a_1 + a_2 + a_3 + a_4; \
> > > > +}
> > > > +
> > > > +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \
> > > > +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \
> > > > +    TYPE a_0, a_1, a_2, a_3, a_4; \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    TYPE *a = arr+1; \
> > > > +    a_0 = a[100]; \
> > > > +    a_1 = a[101]; \
> > > > +    a_2 = a[102]; \
> > > > +    a_3 = a[103]; \
> > > > +    a_4 = a[110]; \
> > > > +    return a_0 + a_1 + a_2 + a_3 + a_4; \
> > > > +}
> > > > +
> > > > +LDP_TEST_ALIGNED(int32_t);
> > > > +LDP_TEST_ALIGNED(int64_t);
> > > > +LDP_TEST_ALIGNED(v4si);
> > > > +LDP_TEST_UNALIGNED(int32_t);
> > > > +LDP_TEST_UNALIGNED(int64_t);
> > > > +LDP_TEST_UNALIGNED(v4si);
> > > > +LDP_TEST_ADJUST_ALIGNED(int32_t);
> > > > +LDP_TEST_ADJUST_ALIGNED(int64_t);
> > > > +LDP_TEST_ADJUST_UNALIGNED(int32_t);
> > > > +LDP_TEST_ADJUST_UNALIGNED(int64_t);
> > > > +
> > > > +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 0 } } 
> > > > */
> > > > +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 0 } } 
> > > > */
> > > > +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 0 } } 
> > > > */
> > > > +
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/stp_aligned.c
> b/gcc/testsuite/gcc.target/aarch64/stp_aligned.c
> > > > new file mode 100644
> > > > index 00000000000..01f294bb090
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/stp_aligned.c
> > > > @@ -0,0 +1,60 @@
> > > > +/* { dg-options "-O2 --param=aarch64-stp-policy=3 -mcpu=generic" } */
> > > > +
> > > > +#include <stdlib.h>
> > > > +#include <stdint.h>
> > > > +
> > > > +typedef int v4si __attribute__ ((vector_size (16)));
> > > > +
> > > > +#define STP_TEST_ALIGNED(TYPE) \
> > > > +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    arr[0] = x; \
> > > > +    arr[1] = x; \
> > > > +    return arr; \
> > > > +}
> > > > +
> > > > +#define STP_TEST_UNALIGNED(TYPE) \
> > > > +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    TYPE *a = arr+1; \
> > > > +    a[0] = x; \
> > > > +    a[1] = x; \
> > > > +    return a; \
> > > > +}
> > > > +
> > > > +#define STP_TEST_ADJUST_ALIGNED(TYPE) \
> > > > +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    arr[100] = x; \
> > > > +    arr[101] = x; \
> > > > +    arr[102] = x; \
> > > > +    arr[103] = x; \
> > > > +    return arr; \
> > > > +}
> > > > +
> > > > +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \
> > > > +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    TYPE *a = arr+1; \
> > > > +    a[100] = x; \
> > > > +    a[101] = x; \
> > > > +    a[102] = x; \
> > > > +    a[103] = x; \
> > > > +    return a; \
> > > > +}
> > > > +
> > > > +STP_TEST_ALIGNED(int32_t);
> > > > +STP_TEST_ALIGNED(int64_t);
> > > > +STP_TEST_ALIGNED(v4si);
> > > > +STP_TEST_UNALIGNED(int32_t);
> > > > +STP_TEST_UNALIGNED(int64_t);
> > > > +STP_TEST_UNALIGNED(v4si);
> > > > +STP_TEST_ADJUST_ALIGNED(int32_t);
> > > > +STP_TEST_ADJUST_ALIGNED(int64_t);
> > > > +STP_TEST_ADJUST_UNALIGNED(int32_t);
> > > > +STP_TEST_ADJUST_UNALIGNED(int64_t);
> > > > +
> > > > +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 3 } } 
> > > > */
> > > > +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 3 } } 
> > > > */
> > > > +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 1 } } 
> > > > */
> > > > +
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/stp_always.c
> b/gcc/testsuite/gcc.target/aarch64/stp_always.c
> > > > new file mode 100644
> > > > index 00000000000..cedb461b5b2
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/stp_always.c
> > > > @@ -0,0 +1,60 @@
> > > > +/* { dg-options "-O2 --param=aarch64-stp-policy=1 -mcpu=generic" } */
> > > > +
> > > > +#include <stdlib.h>
> > > > +#include <stdint.h>
> > > > +
> > > > +typedef int v4si __attribute__ ((vector_size (16)));
> > > > +
> > > > +#define STP_TEST_ALIGNED(TYPE) \
> > > > +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    arr[0] = x; \
> > > > +    arr[1] = x; \
> > > > +    return arr; \
> > > > +}
> > > > +
> > > > +#define STP_TEST_UNALIGNED(TYPE) \
> > > > +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    TYPE *a = arr+1; \
> > > > +    a[0] = x; \
> > > > +    a[1] = x; \
> > > > +    return a; \
> > > > +}
> > > > +
> > > > +#define STP_TEST_ADJUST_ALIGNED(TYPE) \
> > > > +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    arr[100] = x; \
> > > > +    arr[101] = x; \
> > > > +    arr[102] = x; \
> > > > +    arr[103] = x; \
> > > > +    return arr; \
> > > > +}
> > > > +
> > > > +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \
> > > > +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    TYPE *a = arr+1; \
> > > > +    a[100] = x; \
> > > > +    a[101] = x; \
> > > > +    a[102] = x; \
> > > > +    a[103] = x; \
> > > > +    return a; \
> > > > +}
> > > > +
> > > > +STP_TEST_ALIGNED(int32_t);
> > > > +STP_TEST_ALIGNED(int64_t);
> > > > +STP_TEST_ALIGNED(v4si);
> > > > +STP_TEST_UNALIGNED(int32_t);
> > > > +STP_TEST_UNALIGNED(int64_t);
> > > > +STP_TEST_UNALIGNED(v4si);
> > > > +STP_TEST_ADJUST_ALIGNED(int32_t);
> > > > +STP_TEST_ADJUST_ALIGNED(int64_t);
> > > > +STP_TEST_ADJUST_UNALIGNED(int32_t);
> > > > +STP_TEST_ADJUST_UNALIGNED(int64_t);
> > > > +
> > > > +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 6 } } 
> > > > */
> > > > +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 6 } } 
> > > > */
> > > > +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 2 } } 
> > > > */
> > > > +
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/stp_never.c
> b/gcc/testsuite/gcc.target/aarch64/stp_never.c
> > > > new file mode 100644
> > > > index 00000000000..ddde658f807
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/stp_never.c
> > > > @@ -0,0 +1,60 @@
> > > > +/* { dg-options "-O2 --param=aarch64-stp-policy=2 -mcpu=generic" } */
> > > > +
> > > > +#include <stdlib.h>
> > > > +#include <stdint.h>
> > > > +
> > > > +typedef int v4si __attribute__ ((vector_size (16)));
> > > > +
> > > > +#define STP_TEST_ALIGNED(TYPE) \
> > > > +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    arr[0] = x; \
> > > > +    arr[1] = x; \
> > > > +    return arr; \
> > > > +}
> > > > +
> > > > +#define STP_TEST_UNALIGNED(TYPE) \
> > > > +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    TYPE *a = arr+1; \
> > > > +    a[0] = x; \
> > > > +    a[1] = x; \
> > > > +    return a; \
> > > > +}
> > > > +
> > > > +#define STP_TEST_ADJUST_ALIGNED(TYPE) \
> > > > +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    arr[100] = x; \
> > > > +    arr[101] = x; \
> > > > +    arr[102] = x; \
> > > > +    arr[103] = x; \
> > > > +    return arr; \
> > > > +}
> > > > +
> > > > +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \
> > > > +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \
> > > > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 
> > > > 1)); \
> > > > +    TYPE *a = arr+1; \
> > > > +    a[100] = x; \
> > > > +    a[101] = x; \
> > > > +    a[102] = x; \
> > > > +    a[103] = x; \
> > > > +    return a; \
> > > > +}
> > > > +
> > > > +STP_TEST_ALIGNED(int32_t);
> > > > +STP_TEST_ALIGNED(int64_t);
> > > > +STP_TEST_ALIGNED(v4si);
> > > > +STP_TEST_UNALIGNED(int32_t);
> > > > +STP_TEST_UNALIGNED(int64_t);
> > > > +STP_TEST_UNALIGNED(v4si);
> > > > +STP_TEST_ADJUST_ALIGNED(int32_t);
> > > > +STP_TEST_ADJUST_ALIGNED(int64_t);
> > > > +STP_TEST_ADJUST_UNALIGNED(int32_t);
> > > > +STP_TEST_ADJUST_UNALIGNED(int64_t);
> > > > +
> > > > +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 0 } } 
> > > > */
> > > > +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 0 } } 
> > > > */
> > > > +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 0 } } 
> > > > */
> > > > +
> > > > --
> > > > 2.40.1
> > > >

RE: [PATCH v3] aarch64: Fine-grained policies to control ldp-stp formation.

Reply via email to