Hello Alex:

On 08/05/24 6:18 pm, Alex Coplan wrote:
> Hi Ajit,
> 
> Sorry for the long delay in reviewing this.
> 
> This is really getting there now.  I've left a few more comments below.
> 
> Apart from minor style things, the main remaining issues are mostly
> around comments.  It's important to have good clear comments for
> functions with the parameters (and return value, if any) clearly
> described.  See https://www.gnu.org/prep/standards/standards.html#Comments
> 
> Note that this now needs a little rebasing, too.
> 

Done.

> On 21/04/2024 13:22, Ajit Agarwal wrote:
>> Hello Alex/Richard:
>>
>> All review comments are addressed and changes are made to transform_for_base
>> function as per consensus.
>>
>> Common infrastructure of load store pair fusion is divided into target
>> independent and target dependent changed code.
>>
>> Target independent code is the Generic code with pure virtual function
>> to interface betwwen target independent and dependent code.
>>
>> Target dependent code is the implementation of pure virtual function for
>> aarch64 target and the call to target independent code.
>>
>> Bootstrapped on aarch64-linux-gnu.
>>
>> Thanks & Regards
>> Ajit
>>
>>
>>
>> aarch64: Preparatory patch to place target independent and
>> dependent changed code in one file
>>
>> Common infrastructure of load store pair fusion is divided into target
>> independent and target dependent changed code.
>>
>> Target independent code is the Generic code with pure virtual function
>> to interface betwwen target independent and dependent code.
>>
>> Target dependent code is the implementation of pure virtual function for
>> aarch64 target and the call to target independent code.
>>
>> 2024-04-21  Ajit Kumar Agarwal  <aagar...@linux.ibm.com>
>>
>> gcc/ChangeLog:
>>
>>      * config/aarch64/aarch64-ldp-fusion.cc: Place target
>>      independent and dependent changed code
>> ---
>>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 484 +++++++++++++++--------
>>  1 file changed, 325 insertions(+), 159 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
>> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> index 365dcf48b22..83a917e1d20 100644
>> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> @@ -138,6 +138,189 @@ struct alt_base
>>    poly_int64 offset;
>>  };
>>  
>> +// Virtual base class for load/store walkers used in alias analysis.
>> +struct alias_walker
>> +{
>> +  virtual bool conflict_p (int &budget) const = 0;
>> +  virtual insn_info *insn () const = 0;
>> +  virtual bool valid () const = 0;
>> +  virtual void advance () = 0;
>> +};
>> +
>> +// Forward declaration to be used inside the aarch64_pair_fusion class.
>> +bool ldp_operand_mode_ok_p (machine_mode mode);
>> +rtx aarch64_destructure_load_pair (rtx regs[2], rtx pattern);
>> +rtx aarch64_destructure_store_pair (rtx regs[2], rtx pattern);
>> +rtx aarch64_gen_writeback_pair (rtx wb_effect, rtx pair_mem, rtx regs[2],
>> +                        bool load_p);
> 
> I don't think we want to change the linkage of these, they should be kept
> static.
> 
>> +enum class writeback{
> 
> Nit: space before '{'
> 
>> +  WRITEBACK_PAIR_P,
>> +  WRITEBACK
>> +};
> 
> We're going to want some more descriptive names here.  How about
> EXISTING and ALL?  Note that the WRITEBACK_ prefix isn't necessary as
> you're using an enum class, so uses of the enumerators need to be
> prefixed with writeback:: anyway.  A comment describing the usage of the
> enum as well as comments above the enumerators describing their
> interpretation would be good.
> 

Done.
>> +
>> +struct pair_fusion {
>> +
> 
> Nit: excess blank line.
> 
>> +  pair_fusion ()
>> +  {
>> +    calculate_dominance_info (CDI_DOMINATORS);
>> +    df_analyze ();
>> +    crtl->ssa = new rtl_ssa::function_info (cfun);
>> +  };
> 
> Can we have one blank line between the virtual functions, please?  I
> think that would be more readable now that there are comments above each
> of them.
> 

Done.

>> +  // Return true if GPR is FP or SIMD accesses, passed
>> +  // with GPR reg_op rtx, machine mode and load_p.
> 
> It's slightly awkward trying to document this without the parameter
> names, but I can see that you're omitting them to avoid unused parameter
> warnings.  One option would be to introduce names in the comment as you
> go.  How about this instead:
> 
> // Given:
> // - an rtx REG_OP, the non-memory operand in a load/store insn,
> // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and
> // - a boolean LOAD_P (true iff the insn is a load), then:
> // return true if the access should be considered an FP/SIMD access.
> // Such accesses are segregated from GPR accesses, since we only want to
> // form pairs for accesses that use the same register file.
> 

Done.
>> +  virtual bool fpsimd_op_p (rtx, machine_mode, bool)
>> +  {
>> +    return false;
>> +  }
>> +  // Return true if pair operand mode is ok. Passed with
>> +  // machine mode.
> 
> Could you use something closer to the comment that is already above
> ldp_operand_mode_ok_p?  The purpose of this predicate is really to test
> the following: "is it a good idea (for optimization) to form paired
> accesses with this operand mode at this stage in compilation?"
> 

Done.
>> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
>> +  // Return true if reg operand is ok, passed with load_p,
>> +  // reg_op rtx and machine mode.
> 
> Can you try to use the parameter names in the comment where possible?
> The comment needs to be a bit more precise, what does "ok" mean here?
> How about:
> 
> // Return true iff REG_OP is a suitable register operand for a paired
> // memory access, where LOAD_P is true if we're asking about loads and
> // false for stores.  MEM_MODE gives the mode of the operand.
> 
> Also ...
> 
>> +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
>> +                                  machine_mode mem_mode) = 0;
> 
> ... can we use just mode (instead of mem_mode) as the parameter name
> here?  We're only using the mem mode because the non-memory operand
> might be a const_int, in which case we need to infer the mode of the
> register operand from the mode of the memory operand.  But really it's
> just the mode of REG_OP we're passing here.
> 

Done.
>> +  // Return alias check limit.
>> +  virtual int pair_mem_alias_check_limit () = 0;
> 
> Can you extend this comment with something like:  "This is needed to
> avoid unbounded quadratic behaviour when performing alias analysis."
> 
>> +  // Return true if there is writeback opportunities. Passed
>> +  // with enum writeback.
> 
> Not exactly.  This returns true if we should try to handle writeback
> opportunities (not whether there are any).
> 
>> +  virtual bool handle_writeback_opportunities (enum writeback wback) = 0 ;
>> +  // Return true if mem ok ldp stp policy model passed with
>> +  // rtx mem, load_p and machine mode.
> 
> This doesn't parse.  Please can you write full grammatically-correct
> sentences in comments?
> 
Done.

> I can't see a need to pass the mode separately here (the current code
> just passes GET_MODE (first_mem)), so I think we only need two parameters
> here: first_mem and load_p.  Let's rename s/first_mem/base_mem/ here, too.
> 
> Then the comment can read something like:
> 
> // Given BASE_MEM, the mem from the lower candidate access for a pair,
> // and LOAD_P (true if the access is a load), check if we should proceed
> // to form the pair given the target's code generation policy on
> // paired accesses.
> 
>> +  virtual bool pair_mem_ok_with_policy (rtx first_mem, bool load_p,
>> +                                    machine_mode mode) = 0;
>> +  // Gen load store mem pair. Return load store rtx passed
>> +  // with arguments load store pattern, writeback rtx and
>> +  // load_p.
> 

Done.
> As above, let's use the formal parameter names here.  This comment could
> also do with more detail.  How about:
> 
> // Generate the pattern for a paired access.  PATS gives the patterns
> // for the individual memory accesses (which by this point must share a
> // common base register).  If WRITEBACK is non-NULL, then this rtx
> // describes the update to the base register that should be performed by
> // the resulting insn.  LOAD_P is true iff the accesses are loads.
> 
>> +  virtual rtx gen_mem_pair (rtx *pats, rtx writeback,
>> +                        bool load_p) = 0;
> 
> Nit on naming: how about just gen_pair?  It is clear from being a member
> function of the pass what sort of pair we are talking about.
> 
>> +  // Return true if memory writeback can be promoted, passed
>> +  // with insn, rtx pattern and load_p. load_p is set by this
>> +  // hook.
>> +  virtual bool pair_mem_promote_writeback_p (insn_info *, rtx, bool &)
>> +  {
>> +     return false;
>> +  }
> 
> I think we should change the interface here, see the comments in
> ldp_fusion_bb below.
> 
>> +  // Return true if we track loads.
> 
> I think this reads better if you say "if we _should_ track loads".
> Likewise for the store case below.
> 
Done.
>> +  virtual bool track_loads_p ()
>> +  {
>> +    return true;
>> +  }
>> +  // Return true if we track stores.
>> +  virtual bool track_stores_p ()
>> +  {
>> +    return true;
>> +  }
>> +  // Return true if offset is out of range.
>> +  virtual bool pair_mem_out_of_range_p (HOST_WIDE_INT off) = 0;
>> +  // Return destructure pair. Passed with rtx reg, insn pattern
>> +  // and load_p.
>> +  virtual rtx gen_destructure_pair (rtx regs[2], rtx rti, bool load_p) = 0;
> 
> So firstly this shouldn't be prefixed with gen_ as there is no RTL generation
> going on, it's just pulling apart an existing pair insn.
> 
> On the comment: again you should use the formal parameter names and describe
> the purpose of each parameter.
> 

Done.
>> +  // Return writeback pair. Passed with rtx writeback effect, mem rtx
>> +  // regs rtx and load_p.
>> +  virtual rtx gen_writeback_pair (rtx wb_effect, rtx mem,
>> +                              rtx regs[2], bool load_p) = 0;
> 
> I realise this is pre-existing, but: I think we want a more descriptive name
> for this function.  The key bit of the comment above
> aarch64_gen_writeback_pair is:
> 
> // This is used when promoting existing non-writeback pairs to writeback
> // variants.
> 
> so how about gen_promote_writeback_pair instead?  Including that part of
> the comment above would be good, too.  I wasn't sure on a first glance
> how this function differed from your gen_mem_pair function, which can
> also generate writeback pairs.
> 

Done.
>> +  void ldp_fusion_bb (bb_info *bb);
>> +  insn_info * find_trailing_add (insn_info *insns[2],
> 
> Style nit: no space after *.  Also this should probably be marked inline
> as we want the linkage kept within this file.  Likewise with the next
> two functions.
> 
>> +                             const insn_range_info &pair_range,
>> +                             int initial_writeback,
>> +                             rtx *writeback_effect,
>> +                             def_info **add_def,
>> +                             def_info *base_def,
>> +                             poly_int64 initial_offset,
>> +                             unsigned access_size);
>> +  int get_viable_bases (insn_info *insns[2],
>> +                    vec<base_cand> &base_cands,
>> +                    rtx cand_mems[2],
>> +                    unsigned access_size,
>> +                    bool reversed);
>> +  void do_alias_analysis (insn_info *alias_hazards[4],
>> +                      alias_walker *walkers[4],
>> +                      bool load_p);
>> +  void run ();
>> +  ~pair_fusion()
>> +  {
>> +    if (crtl->ssa->perform_pending_updates ())
>> +      cleanup_cfg (0);
>> +
>> +    free_dominance_info (CDI_DOMINATORS);
>> +
>> +    delete crtl->ssa;
>> +    crtl->ssa = nullptr;
>> +  }
>> +};
>> +
>> +void
>> +pair_fusion::run ()
> 
> This function could do with a comment.
> 

Done.
>> +{
>> +  if (!track_loads_p () && !track_stores_p ())
>> +    return;
>> +
>> +  for (auto bb : crtl->ssa->bbs ())
>> +    ldp_fusion_bb (bb);
>> +}
>> +
>> +struct aarch64_pair_fusion : public pair_fusion
>> +{
>> +  bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode,
>> +                bool load_p) override final
>> +  {
>> +    return reload_completed
>> +      ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op)))
>> +      : (GET_MODE_CLASS (mem_mode) != MODE_INT
>> +     && (load_p || !aarch64_const_zero_rtx_p (reg_op)));
>> +  }
>> +  bool pair_mem_promote_writeback_p (insn_info *insn, rtx pat, bool 
>> &load_p);
> 
> All of these other virtual overrides should be marked with the
> override final keywords, too.
> 

Done.
>> +  bool pair_mem_ok_with_policy (rtx first_mem, bool load_p,
>> +                            machine_mode mode)
>> +  {
>> +    return aarch64_mem_ok_with_ldpstp_policy_model (first_mem,
>> +                                                 load_p,
>> +                                                 mode);
>> +  }
>> +  bool pair_operand_mode_ok_p (machine_mode mode)
>> +  {
>> +    return ldp_operand_mode_ok_p (mode);
>> +  }
> 
> Here you can just rename the static function ldp_operand_mode_ok_p
> to aarch64_pair_fusion::pair_operand_mode_ok_p.  No need for the
> wrapper function.
> 
>> +  rtx gen_mem_pair (rtx *pats, rtx writeback, bool load_p);
>> +  bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
>> +                          machine_mode mem_mode)
>> +  {
>> +    return (load_p
>> +         ? aarch64_ldp_reg_operand (reg_op, mem_mode)
>> +         : aarch64_stp_reg_operand (reg_op, mem_mode));
>> +  }
>> +  int pair_mem_alias_check_limit ()
>> +  {
>> +    return aarch64_ldp_alias_check_limit;
>> +  }
>> +  bool handle_writeback_opportunities (enum writeback wback)
> 
> Just a suggestion: you could call this parameter "which" as it is
> talking about which opportunities we want to handle.
> 

Done.
>> +  {
>> +    if (wback == writeback::WRITEBACK_PAIR_P)
>> +      return aarch64_ldp_writeback > 1;
>> +    else
>> +      return aarch64_ldp_writeback;
>> +  }
>> +  bool track_loads_p ()
>> +  {
>> +    return
>> +      aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
> 
> Style nit: I _think_ this would be better formatted with the LHS of the
> expression on the same line as the return keyword, then the != on a new
> line.  I.e.:
> 
> return aarch64_tune_params.ldp_policy_model
>        != AARCH64_LDP_STP_POLICY_NEVER;
> 

Done.
>> +  }
>> +  bool track_stores_p ()
>> +  {
>> +    return
>> +      aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
> 
> Likewise.
> 

Done.
>> +  }
>> +  bool pair_mem_out_of_range_p (HOST_WIDE_INT off)
>> +  {
>> +    return (off < LDP_MIN_IMM || off > LDP_MAX_IMM);
>> +  }
> 
> When I said (in
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648878.html):
> 
>> Please can you invert the condition and fix up callers appropriately?
> 
> I was expecting you to keep the function named pair_mem_in_range_p and invert
> the condition, then update callers to introduce negations.
> 
> I think it's more natural to have the function test for the positive condition
> (is this offset in range) and then callers negate it if they want to check if
> an offset is out of range.  That matches existing practice elsewhere.
> 

Done.
>> +  rtx gen_writeback_pair (rtx wb_effect, rtx mem, rtx regs[2], bool load_p)
>> +  {
>> +    return aarch64_gen_writeback_pair (wb_effect, mem, regs, load_p);
>> +  }
> 

Done.
> Just rename the static function aarch64_gen_writeback_pair to
> aarch64_pair_fusion::gen_writeback_pair, then there is no need for this
> wrapper.
> 
>> +  rtx gen_destructure_pair (rtx regs[2], rtx rti, bool load_p);
>> +};
>> +
>>  // State used by the pass for a given basic block.
>>  struct ldp_bb_info
>>  {
>> @@ -160,8 +343,11 @@ struct ldp_bb_info
>>  
>>    static const size_t obstack_alignment = sizeof (void *);
>>    bb_info *m_bb;
>> +  pair_fusion *m_pass;
> 
> I realise this is pre-existing, but can m_bb and m_pass both be private
> members?
> 

Done.
>>  
>> -  ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
>> +  ldp_bb_info (bb_info *bb,
>> +                  pair_fusion *d) : m_bb (bb),
>> +                  m_pass (d), m_emitted_tombstone (false)
> 
> I think the preferred style is to have the parameters on the same line
> and the initializer list (starting with ':') on a new line.
> 
>>    {
>>      obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE,
>>                              obstack_alignment, obstack_chunk_alloc,
>> @@ -177,10 +363,28 @@ struct ldp_bb_info
>>      bitmap_obstack_release (&m_bitmap_obstack);
>>        }
>>    }
>> -
> 
> Any need for this whitespace change?
> 

Done.
>>    inline void track_access (insn_info *, bool load, rtx mem);
>>    inline void transform ();
>>    inline void cleanup_tombstones ();
>> +  inline void merge_pairs (insn_list_t &, insn_list_t &,
>> +                bool load_p, unsigned access_size);
>> +  inline void transform_for_base (int load_size, access_group &group);
>> +
>> +  inline bool try_fuse_pair (bool load_p, unsigned access_size,
>> +                         insn_info *i1, insn_info *i2);
>> +
>> +  inline bool fuse_pair (bool load_p, unsigned access_size,
>> +              int writeback,
>> +              insn_info *i1, insn_info *i2,
>> +              base_cand &base,
>> +              const insn_range_info &move_range);
>> +
>> +  inline void track_tombstone (int uid);
>> +
>> +  inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs);
>> +
>> +  template<typename Map>
>> +    void traverse_base_map (Map &map);
> 
> What's the reason for changing the access of these functions?
> Can they be kept private?
> 
Done.
>>  
>>  private:
>>    obstack m_obstack;
>> @@ -191,27 +395,60 @@ private:
>>    bool m_emitted_tombstone;
>>  
>>    inline splay_tree_node<access_record *> *node_alloc (access_record *);
>> +};
>>  
>> -  template<typename Map>
>> -  inline void traverse_base_map (Map &map);
>> -  inline void transform_for_base (int load_size, access_group &group);
>> -
>> -  inline void merge_pairs (insn_list_t &, insn_list_t &,
>> -                       bool load_p, unsigned access_size);
>> +bool
>> +aarch64_pair_fusion::pair_mem_promote_writeback_p (insn_info *insn, rtx pat,
>> +                                               bool &load_p)
>> +{
>> +  if (reload_completed
>> +      && aarch64_ldp_writeback > 1
>> +      && GET_CODE (pat) == PARALLEL
>> +      && XVECLEN (pat, 0) == 2)
>> +    {
>> +      auto rti = insn->rtl ();
>> +      const auto attr = get_attr_ldpstp (rti);
>> +      if (attr == LDPSTP_NONE)
>> +    return false;
>>  
>> -  inline bool try_fuse_pair (bool load_p, unsigned access_size,
>> -                         insn_info *i1, insn_info *i2);
>> +      load_p = (attr == LDPSTP_LDP);
>> +      gcc_checking_assert (load_p || attr == LDPSTP_STP);
>> +      return true;
>> +    }
>> +  return false;
>> +}
>>  
>> -  inline bool fuse_pair (bool load_p, unsigned access_size,
>> -                     int writeback,
>> -                     insn_info *i1, insn_info *i2,
>> -                     base_cand &base,
>> -                     const insn_range_info &move_range);
>> +rtx
>> +aarch64_pair_fusion::gen_destructure_pair (rtx regs[2], rtx rti, bool 
>> load_p)
>> +{
>> +  if (load_p)
>> +    return aarch64_destructure_load_pair (regs, rti);
>> +  else
>> +    return aarch64_destructure_store_pair (regs, rti);
>> +}
>>  
>> -  inline void track_tombstone (int uid);
>> +rtx
>> +aarch64_pair_fusion::gen_mem_pair (rtx *pats,
>> +              rtx writeback,
>> +              bool load_p)
>> +{
>> +  rtx pair_pat;
>>  
>> -  inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs);
>> -};
>> +  if (writeback)
>> +    {
>> +      auto patvec = gen_rtvec (3, writeback, pats[0], pats[1]);
>> +      return  gen_rtx_PARALLEL (VOIDmode, patvec);
>> +    }
>> +  else if (load_p)
>> +    return aarch64_gen_load_pair (XEXP (pats[0], 0),
>> +                              XEXP (pats[1], 0),
>> +                              XEXP (pats[0], 1));
>> +  else
>> +    return aarch64_gen_store_pair (XEXP (pats[0], 0),
>> +                               XEXP (pats[0], 1),
>> +                               XEXP (pats[1], 1));
>> +  return pair_pat;
>> +}
>>  
>>  splay_tree_node<access_record *> *
>>  ldp_bb_info::node_alloc (access_record *access)
>> @@ -312,7 +549,7 @@ any_post_modify_p (rtx x)
>>  
>>  // Return true if we should consider forming ldp/stp insns from memory
>>  // accesses with operand mode MODE at this stage in compilation.
>> -static bool
>> +bool
>>  ldp_operand_mode_ok_p (machine_mode mode)
>>  {
>>    const bool allow_qregs
>> @@ -412,9 +649,10 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx 
>> mem, lfs_fields lfs)
>>    const machine_mode mem_mode = GET_MODE (mem);
>>    const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant ();
>>  
>> -  // Punt on misaligned offsets.  LDP/STP instructions require offsets to 
>> be a
>> -  // multiple of the access size, and we believe that misaligned offsets on
>> -  // MEM_EXPR bases are likely to lead to misaligned offsets w.r.t. RTL 
>> bases.
>> +  // Punt on misaligned offsets.  Paired memory access instructions require
>> +  // offsets to be a multiple of the access size, and we believe that
>> +  // misaligned offsets on MEM_EXPR bases are likely to lead to misaligned
>> +  // offsets w.r.t. RTL bases.
>>    if (!multiple_p (offset, mem_size))
>>      return false;
>>  
>> @@ -438,10 +676,10 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx 
>> mem, lfs_fields lfs)
>>  }
>>  
>>  // Main function to begin pair discovery.  Given a memory access INSN,
>> -// determine whether it could be a candidate for fusing into an ldp/stp,
>> -// and if so, track it in the appropriate data structure for this basic
>> -// block.  LOAD_P is true if the access is a load, and MEM is the mem
>> -// rtx that occurs in INSN.
>> +// determine whether it could be a candidate for fusing into a paired
>> +// access, and if so, track it in the appropriate data structure for
>> +// this basic block.  LOAD_P is true if the access is a load, and MEM
>> +//  is the mem rtx that occurs in INSN.
>>  void
>>  ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
>>  {
>> @@ -449,35 +687,26 @@ ldp_bb_info::track_access (insn_info *insn, bool 
>> load_p, rtx mem)
>>    if (MEM_VOLATILE_P (mem))
>>      return;
>>  
>> -  // Ignore writeback accesses if the param says to do so.
>> -  if (!aarch64_ldp_writeback
>> +  // Ignore writeback accesses if the hook says to do so.
>> +  if (!m_pass->handle_writeback_opportunities (writeback::WRITEBACK)
>>        && GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) == RTX_AUTOINC)
>>      return;
>>  
>>    const machine_mode mem_mode = GET_MODE (mem);
>> -  if (!ldp_operand_mode_ok_p (mem_mode))
>> +  if (!m_pass->pair_operand_mode_ok_p (mem_mode))
>>      return;
>>  
>>    rtx reg_op = XEXP (PATTERN (insn->rtl ()), !load_p);
>>  
>> -  // Ignore the access if the register operand isn't suitable for ldp/stp.
>> -  if (load_p
>> -      ? !aarch64_ldp_reg_operand (reg_op, mem_mode)
>> -      : !aarch64_stp_reg_operand (reg_op, mem_mode))
>> +  if (!m_pass->pair_reg_operand_ok_p (load_p, reg_op, mem_mode))
>>      return;
>> -
> 
> Let's keep this newline.
> 
>>    // We want to segregate FP/SIMD accesses from GPR accesses.
>>    //
>>    // Before RA, we use the modes, noting that stores of constant zero
>>    // operands use GPRs (even in non-integer modes).  After RA, we use
>>    // the hard register numbers.
> 
> The second para of this comment would probably be best moved to the aarch64
> implementation of the hook (with the first sentence duplicated in both 
> places).
> 
>> -  const bool fpsimd_op_p
>> -    = reload_completed
>> -    ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op)))
>> -    : (GET_MODE_CLASS (mem_mode) != MODE_INT
>> -       && (load_p || !aarch64_const_zero_rtx_p (reg_op)));
>> -
>> -  // Note ldp_operand_mode_ok_p already rejected VL modes.
>> +  const bool fpsimd_op_p = m_pass->fpsimd_op_p (reg_op, mem_mode, load_p);
> 
> Might be more readable with a blank line inserted here.
>
Done.
 
>> +  // Note pair_operand_mode_ok_p already rejected VL modes.
>>    const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant ();
>>    const lfs_fields lfs = { load_p, fpsimd_op_p, mem_size };
>>  
>> @@ -506,8 +735,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, 
>> rtx mem)
>>    // elimination offset pre-RA, we should postpone forming pairs on such
>>    // accesses until after RA.
>>    //
>> -  // As it stands, addresses with offsets in range for LDR but not
>> -  // in range for LDP/STP are currently reloaded inefficiently,
>> +  // As it stands, addresses in range for an individual load/store but not
>> +  // for a paired access are currently reloaded inefficiently,
>>    // ending up with a separate base register for each pair.
>>    //
>>    // In theory LRA should make use of
>> @@ -519,8 +748,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, 
>> rtx mem)
>>    // that calls targetm.legitimize_address_displacement.
>>    //
>>    // So for now, it's better to punt when we can't be sure that the
>> -  // offset is in range for LDP/STP.  Out-of-range cases can then be
>> -  // handled after RA by the out-of-range LDP/STP peepholes.  Eventually, it
>> +  // offset is in range for paired access.  Out-of-range cases can then be
>> +  // handled after RA by the out-of-range PAIR MEM  peepholes.  Eventually, 
>> it
> 
> Does this part really apply to rs6000?  Do you have peephole2 patterns for 
> lxvp?
> If not you may want to caveat this part as being specific to aarch64.  E.g. 
> "On
> aarch64, out-of-range cases ..."
> 
>>    // would be nice to handle known out-of-range opportunities in the
>>    // pass itself (for stack accesses, this would be in the post-RA pass).
>>    if (!reload_completed
>> @@ -573,8 +802,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, 
>> rtx mem)
>>      gcc_unreachable (); // Base defs should be unique.
>>      }
>>  
>> -  // Punt on misaligned offsets.  LDP/STP require offsets to be a multiple 
>> of
>> -  // the access size.
>> +  // Punt on misaligned offsets.  Paired memory accesses require offsets
>> +  // to be a multiple of the access size.
>>    if (!multiple_p (mem_off, mem_size))
>>      return;
>>  
>> @@ -1207,8 +1436,8 @@ extract_writebacks (bool load_p, rtx pats[2], int 
>> changed)
>>  // base register.  If there is one, we choose the first such update after
>>  // PAIR_DST that is still in the same BB as our pair.  We return the new 
>> def in
>>  // *ADD_DEF and the resulting writeback effect in *WRITEBACK_EFFECT.
>> -static insn_info *
>> -find_trailing_add (insn_info *insns[2],
>> +insn_info *
>> +pair_fusion::find_trailing_add (insn_info *insns[2],
>>                 const insn_range_info &pair_range,
>>                 int initial_writeback,
>>                 rtx *writeback_effect,
>> @@ -1286,7 +1515,7 @@ find_trailing_add (insn_info *insns[2],
>>  
>>    off_hwi /= access_size;
>>  
>> -  if (off_hwi < LDP_MIN_IMM || off_hwi > LDP_MAX_IMM)
>> +  if (pair_mem_out_of_range_p (off_hwi))
>>      return nullptr;
>>  
>>    auto dump_prefix = [&]()
>> @@ -1800,7 +2029,7 @@ ldp_bb_info::fuse_pair (bool load_p,
>>      {
>>        if (dump_file)
>>      fprintf (dump_file,
>> -             "  ldp: i%d has wb but subsequent i%d has non-wb "
>> +             "  load pair: i%d has wb but subsequent i%d has non-wb "
>>               "update of base (r%d), dropping wb\n",
>>               insns[0]->uid (), insns[1]->uid (), base_regno);
>>        gcc_assert (writeback_effect);
>> @@ -1823,7 +2052,7 @@ ldp_bb_info::fuse_pair (bool load_p,
>>      }
>>  
>>    // If either of the original insns had writeback, but the resulting pair 
>> insn
>> -  // does not (can happen e.g. in the ldp edge case above, or if the 
>> writeback
>> +  // does not (can happen e.g. in the load pair edge case above, or if the 
>> writeback
>>    // effects cancel out), then drop the def(s) of the base register as
>>    // appropriate.
>>    //
>> @@ -1842,7 +2071,7 @@ ldp_bb_info::fuse_pair (bool load_p,
>>    // update of the base register and try and fold it in to make this into a
>>    // writeback pair.
>>    insn_info *trailing_add = nullptr;
>> -  if (aarch64_ldp_writeback > 1
>> +  if (m_pass->handle_writeback_opportunities (writeback::WRITEBACK_PAIR_P)
>>        && !writeback_effect
>>        && (!load_p || (!refers_to_regno_p (base_regno, base_regno + 1,
>>                                       XEXP (pats[0], 0), nullptr)
>> @@ -1850,7 +2079,7 @@ ldp_bb_info::fuse_pair (bool load_p,
>>                                           XEXP (pats[1], 0), nullptr))))
>>      {
>>        def_info *add_def;
>> -      trailing_add = find_trailing_add (insns, move_range, writeback,
>> +      trailing_add = m_pass->find_trailing_add (insns, move_range, 
>> writeback,
>>                                      &writeback_effect,
>>                                      &add_def, base.def, offsets[0],
>>                                      access_size);
>> @@ -1863,14 +2092,14 @@ ldp_bb_info::fuse_pair (bool load_p,
>>      }
>>  
>>    // Now that we know what base mem we're going to use, check if it's OK
>> -  // with the ldp/stp policy.
>> +  // with the pair mem policy.
>>    rtx first_mem = XEXP (pats[0], load_p);
>> -  if (!aarch64_mem_ok_with_ldpstp_policy_model (first_mem,
>> -                                            load_p,
>> -                                            GET_MODE (first_mem)))
>> +  if (!m_pass->pair_mem_ok_with_policy (first_mem,
>> +                      load_p,
>> +                      GET_MODE (first_mem)))
> 
> The indentation looks off here.  I suppose when you drop the mode parameter
> (as mentioned above) this will all fit on one line anyway.
> 
>>      {
>>        if (dump_file)
>> -    fprintf (dump_file, "punting on pair (%d,%d), ldp/stp policy says no\n",
>> +    fprintf (dump_file, "punting on pair (%d,%d), pair mem  policy says 
>> no\n",
> 
> Nit: excess space after mem.
> 
Done.
>>               i1->uid (), i2->uid ());
>>        return false;
>>      }
>> @@ -1878,21 +2107,10 @@ ldp_bb_info::fuse_pair (bool load_p,
>>    rtx reg_notes = combine_reg_notes (first, second, load_p);
>>  
>>    rtx pair_pat;
>> -  if (writeback_effect)
>> -    {
>> -      auto patvec = gen_rtvec (3, writeback_effect, pats[0], pats[1]);
>> -      pair_pat = gen_rtx_PARALLEL (VOIDmode, patvec);
>> -    }
>> -  else if (load_p)
>> -    pair_pat = aarch64_gen_load_pair (XEXP (pats[0], 0),
>> -                                  XEXP (pats[1], 0),
>> -                                  XEXP (pats[0], 1));
>> -  else
>> -    pair_pat = aarch64_gen_store_pair (XEXP (pats[0], 0),
>> -                                   XEXP (pats[0], 1),
>> -                                   XEXP (pats[1], 1));
>>  
>> +  pair_pat = m_pass->gen_mem_pair (pats, writeback_effect, load_p);
> 
> Please can you declare pair_pat on the same line it is set, so that we have:
> 
>   rtx pair_pat = m_pass->gen_mem_pair (pats, writeback_effect, load_p);
> 
> 

Done.
>>    insn_change *pair_change = nullptr;
>> +
> 
> Spurious whitespace change?
> 
>>    auto set_pair_pat = [pair_pat,reg_notes](insn_change *change) {
>>      rtx_insn *rti = change->insn ()->rtl ();
>>      validate_unshare_change (rti, &PATTERN (rti), pair_pat, true);
>> @@ -2133,15 +2351,6 @@ load_modified_by_store_p (insn_info *load,
>>    return false;
>>  }
>>  
>> -// Virtual base class for load/store walkers used in alias analysis.
>> -struct alias_walker
>> -{
>> -  virtual bool conflict_p (int &budget) const = 0;
>> -  virtual insn_info *insn () const = 0;
>> -  virtual bool valid () const = 0;
>> -  virtual void advance () = 0;
>> -};
>> -
>>  // Implement some common functionality used by both store_walker
>>  // and load_walker.
>>  template<bool reverse>
>> @@ -2259,13 +2468,13 @@ public:
>>  //
>>  // We try to maintain the invariant that if a walker becomes invalid, we
>>  // set its pointer to null.
>> -static void
>> -do_alias_analysis (insn_info *alias_hazards[4],
>> +void
>> +pair_fusion::do_alias_analysis (insn_info *alias_hazards[4],
>>                 alias_walker *walkers[4],
>>                 bool load_p)
>>  {
>>    const int n_walkers = 2 + (2 * !load_p);
>> -  int budget = aarch64_ldp_alias_check_limit;
>> +  int budget = pair_mem_alias_check_limit ();
>>  
>>    auto next_walker = [walkers,n_walkers](int current) -> int {
>>      for (int j = 1; j <= n_walkers; j++)
>> @@ -2350,8 +2559,8 @@ do_alias_analysis (insn_info *alias_hazards[4],
>>  //
>>  // Returns an integer where bit (1 << i) is set if INSNS[i] uses writeback
>>  // addressing.
>> -static int
>> -get_viable_bases (insn_info *insns[2],
>> +int
>> +pair_fusion::get_viable_bases (insn_info *insns[2],
>>                vec<base_cand> &base_cands,
>>                rtx cand_mems[2],
>>                unsigned access_size,
>> @@ -2397,7 +2606,7 @@ get_viable_bases (insn_info *insns[2],
>>        if (!is_lower)
>>      base_off--;
>>  
>> -      if (base_off < LDP_MIN_IMM || base_off > LDP_MAX_IMM)
>> +      if (pair_mem_out_of_range_p (base_off))
>>      continue;
>>  
>>        use_info *use = find_access (insns[i]->uses (), REGNO (base));
>> @@ -2454,7 +2663,7 @@ get_viable_bases (insn_info *insns[2],
>>  }
>>  
>>  // Given two adjacent memory accesses of the same size, I1 and I2, try
>> -// and see if we can merge them into a ldp or stp.
>> +// and see if we can merge them into a paired access load and store.
> 
> Delete "load and store" here.
>

Done.
 
>>  //
>>  // ACCESS_SIZE gives the (common) size of a single access, LOAD_P is true
>>  // if the accesses are both loads, otherwise they are both stores.
>> @@ -2494,7 +2703,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned 
>> access_size,
>>      {
>>        if (dump_file)
>>      fprintf (dump_file,
>> -             "punting on ldp due to reg conflcits (%d,%d)\n",
>> +             "punting on pair mem load  due to reg conflcits (%d,%d)\n",
>>               insns[0]->uid (), insns[1]->uid ());
>>        return false;
>>      }
>> @@ -2512,7 +2721,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned 
>> access_size,
>>  
>>    auto_vec<base_cand, 2> base_cands (2);
>>  
>> -  int writeback = get_viable_bases (insns, base_cands, cand_mems,
>> +  int writeback = m_pass->get_viable_bases (insns, base_cands, cand_mems,
>>                                  access_size, reversed);
>>    if (base_cands.is_empty ())
>>      {
>> @@ -2641,7 +2850,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned 
>> access_size,
>>      walkers[1] = &backward_store_walker;
>>  
>>    if (load_p && (mem_defs[0] || mem_defs[1]))
>> -    do_alias_analysis (alias_hazards, walkers, load_p);
>> +    m_pass->do_alias_analysis (alias_hazards, walkers, load_p);
>>    else
>>      {
>>        // We want to find any loads hanging off the first store.
>> @@ -2650,7 +2859,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned 
>> access_size,
>>        load_walker<true> backward_load_walker (mem_defs[1], insns[1], 
>> insns[0]);
>>        walkers[2] = &forward_load_walker;
>>        walkers[3] = &backward_load_walker;
>> -      do_alias_analysis (alias_hazards, walkers, load_p);
>> +      m_pass->do_alias_analysis (alias_hazards, walkers, load_p);
>>        // Now consolidate hazards back down.
>>        if (alias_hazards[2]
>>        && (!alias_hazards[0] || (*alias_hazards[2] < *alias_hazards[0])))
>> @@ -2891,7 +3100,7 @@ ldp_bb_info::merge_pairs (insn_list_t &left_list,
>>  // merge_pairs.
>>  void
>>  ldp_bb_info::transform_for_base (int encoded_lfs,
>> -                             access_group &group)
>> +                                     access_group &group)
> 
> Looks like a bad change in indentation.
> 

Done.
>>  {
>>    const auto lfs = decode_lfs (encoded_lfs);
>>    const unsigned access_size = lfs.size;
>> @@ -2964,29 +3173,9 @@ ldp_bb_info::transform ()
>>    traverse_base_map (def_map);
>>  }
>>  
>> -static void
>> -ldp_fusion_init ()
>> -{
>> -  calculate_dominance_info (CDI_DOMINATORS);
>> -  df_analyze ();
>> -  crtl->ssa = new rtl_ssa::function_info (cfun);
>> -}
>> -
>> -static void
>> -ldp_fusion_destroy ()
>> -{
>> -  if (crtl->ssa->perform_pending_updates ())
>> -    cleanup_cfg (0);
>> -
>> -  free_dominance_info (CDI_DOMINATORS);
>> -
>> -  delete crtl->ssa;
>> -  crtl->ssa = nullptr;
>> -}
>> -
>>  // Given a load pair insn in PATTERN, unpack the insn, storing
>>  // the registers in REGS and returning the mem.
>> -static rtx
>> +rtx
> 
Done.
> Again, no need to change the linkage here.
> 
>>  aarch64_destructure_load_pair (rtx regs[2], rtx pattern)
>>  {
>>    rtx mem = NULL_RTX;
>> @@ -3012,7 +3201,7 @@ aarch64_destructure_load_pair (rtx regs[2], rtx 
>> pattern)
>>  
>>  // Given a store pair insn in PATTERN, unpack the insn, storing
>>  // the register operands in REGS, and returning the mem.
>> -static rtx
>> +rtx
> 
> Likewise.

Done.
> 
>>  aarch64_destructure_store_pair (rtx regs[2], rtx pattern)
>>  {
>>    rtx mem = XEXP (pattern, 0);
>> @@ -3030,7 +3219,7 @@ aarch64_destructure_store_pair (rtx regs[2], rtx 
>> pattern)
>>  //
>>  // This is used when promoting existing non-writeback pairs to writeback
>>  // variants.
>> -static rtx
>> +rtx
>>  aarch64_gen_writeback_pair (rtx wb_effect, rtx pair_mem, rtx regs[2],
>>                          bool load_p)
>>  {
>> @@ -3068,22 +3257,13 @@ aarch64_gen_writeback_pair (rtx wb_effect, rtx 
>> pair_mem, rtx regs[2],
>>  // the base register which we can fold in to make this pair use
>>  // a writeback addressing mode.
>>  static void
>> -try_promote_writeback (insn_info *insn)
>> +try_promote_writeback (insn_info *insn, bool load_p, pair_fusion *pass)
> 
> I think this would be better as a member function of pair_fusion.  Then
> there's no need to pass a pointer to the pass structure explicitly.
> 
>>  {
>> -  auto rti = insn->rtl ();
>> -  const auto attr = get_attr_ldpstp (rti);
>> -  if (attr == LDPSTP_NONE)
>> -    return;
>> -
>> -  bool load_p = (attr == LDPSTP_LDP);
>> -  gcc_checking_assert (load_p || attr == LDPSTP_STP);
>> -
>>    rtx regs[2];
>>    rtx mem = NULL_RTX;
>> -  if (load_p)
>> -    mem = aarch64_destructure_load_pair (regs, PATTERN (rti));
>> -  else
>> -    mem = aarch64_destructure_store_pair (regs, PATTERN (rti));
>> +
>> +  mem = pass->gen_destructure_pair (regs, PATTERN (insn->rtl ()), load_p);
>> +
>>    gcc_checking_assert (MEM_P (mem));
>>  
>>    poly_int64 offset;
>> @@ -3120,9 +3300,10 @@ try_promote_writeback (insn_info *insn)
>>    def_info *add_def;
>>    const insn_range_info pair_range (insn);
>>    insn_info *insns[2] = { nullptr, insn };
>> -  insn_info *trailing_add = find_trailing_add (insns, pair_range, 0, 
>> &wb_effect,
>> -                                           &add_def, base_def, offset,
>> -                                           access_size);
>> +  insn_info *trailing_add
>> +    = pass->find_trailing_add (insns, pair_range, 0, &wb_effect,
>> +                           &add_def, base_def, offset,
>> +                           access_size);
>>    if (!trailing_add)
>>      return;
>>  
>> @@ -3132,8 +3313,9 @@ try_promote_writeback (insn_info *insn)
>>    insn_change del_change (trailing_add, insn_change::DELETE);
>>    insn_change *changes[] = { &pair_change, &del_change };
>>  
>> -  rtx pair_pat = aarch64_gen_writeback_pair (wb_effect, mem, regs, load_p);
>> -  validate_unshare_change (rti, &PATTERN (rti), pair_pat, true);
>> +  rtx pair_pat = pass->gen_writeback_pair (wb_effect, mem, regs, load_p);
>> +
> 
> No need for this blank line IMO.
> 
>> +  validate_unshare_change (insn->rtl(), &PATTERN (insn->rtl()), pair_pat, 
>> true);
>>  
>>    // The pair must gain the def of the base register from the add.
>>    pair_change.new_defs = insert_access (attempt,
>> @@ -3167,14 +3349,12 @@ try_promote_writeback (insn_info *insn)
>>  // for load/store candidates.  If running after RA, also try and promote
>>  // non-writeback pairs to use writeback addressing.  Then try to fuse
>>  // candidates into pairs.
>> -void ldp_fusion_bb (bb_info *bb)
>> +void pair_fusion::ldp_fusion_bb (bb_info *bb)
>>  {
>> -  const bool track_loads
>> -    = aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
>> -  const bool track_stores
>> -    = aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
>> +  const bool track_loads = track_loads_p ();
>> +  const bool track_stores = track_stores_p ();
>>  
>> -  ldp_bb_info bb_state (bb);
>> +  ldp_bb_info bb_state (bb, this);
>>  
>>    for (auto insn : bb->nondebug_insns ())
>>      {
>> @@ -3184,11 +3364,9 @@ void ldp_fusion_bb (bb_info *bb)
>>      continue;
>>  
>>        rtx pat = PATTERN (rti);
>> -      if (reload_completed
>> -      && aarch64_ldp_writeback > 1
>> -      && GET_CODE (pat) == PARALLEL
>> -      && XVECLEN (pat, 0) == 2)
>> -    try_promote_writeback (insn);
>> +      bool load_p;
>> +      if (pair_mem_promote_writeback_p (insn, pat, load_p))
>> +    try_promote_writeback (insn, load_p, this);
> 
> As above, if we make try_promote_writeback a member function of pair_fusion,
> the this pointer would be implicit, I think that would be a lot cleaner.
> 
> Also, sorry for the run-around, but I think the function
> pair_mem_promote_writeback_p is trying to do too many things at once.
> 
> I think we should keep these checks (making the latter suitable for generic
> code):
> 
>   if (reload_completed
>       && aarch64_ldp_writeback > 1
> 
> in ldp_fusion_bb, at which point pair_mem_promote_writeback_p just
> becomes a test for "is this insn a (non-writeback) paired memory
> access? (and if so, is it a load?)"
> 
> So ultimately I think we want the following in ldp_fusion_bb:
> 
>   bool load_p;
>   if (reload_completed
>       && handle_writeback_opportunities (writeback::ALL)
>       && pair_mem_insn_p (rti, load_p))
>     try_promote_writeback (insn, load_p);
> 
>   rtx pat = PATTERN (rti);
>   [...]
> 

Done.
> It probably makes most sense to pass just the rtx_insn * to
> pair_mem_insn_p instead of passing both an insn_info * and an rtx for
> the PATTERN.
> 
>>  
>>        if (GET_CODE (pat) != SET)
>>      continue;
>> @@ -3205,12 +3383,8 @@ void ldp_fusion_bb (bb_info *bb)
>>  
>>  void ldp_fusion ()
>>  {
>> -  ldp_fusion_init ();
>> -
>> -  for (auto bb : crtl->ssa->bbs ())
>> -    ldp_fusion_bb (bb);
>> -
>> -  ldp_fusion_destroy ();
>> +  aarch64_pair_fusion pass;
>> +  pass.run ();
>>  }
> 
> I think we mentioned this before, but since this is now a very simple
> two-line function, can we inline it into pass_ldp_fusion::execute?
> ISTM there's no need for the additional wrapper.
> 

Done.

Thanks & Regards
Ajit
>>  
>>  namespace {
>> @@ -3242,14 +3416,6 @@ public:
>>        if (!optimize || optimize_debug)
>>      return false;
>>  
>> -      // If the tuning policy says never to form ldps or stps, don't run
>> -      // the pass.
>> -      if ((aarch64_tune_params.ldp_policy_model
>> -       == AARCH64_LDP_STP_POLICY_NEVER)
>> -      && (aarch64_tune_params.stp_policy_model
>> -          == AARCH64_LDP_STP_POLICY_NEVER))
>> -    return false;
>> -
>>        if (reload_completed)
>>      return flag_aarch64_late_ldp_fusion;
>>        else
>> -- 
>> 2.39.3
>>
> 
> Thanks,
> Alex

Reply via email to