Re: [Patch, aarch64, middle-end] Move pair_fusion pass from aarch64 to middle-end

Alex Coplan Mon, 20 May 2024 12:47:18 -0700

On 20/05/2024 18:44, Alex Coplan wrote:
> Hi Ajit,
> 
> On 20/05/2024 21:50, Ajit Agarwal wrote:
> > Hello Alex/Richard:
> > 
> > Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
> > to support multiple targets.
> > 
> > Common infrastructure of load store pair fusion is divided into
> > target independent and target dependent code.
> > 
> > Target independent code is structured in the following files.
> > gcc/pair-fusion.h
> > gcc/pair-fusion.cc
> > 
> > Target independent code is the Generic code with pure virtual
> > function to interface betwwen target independent and dependent
> > code.
> > 
> > Bootstrapped and regtested on aarch64-linux-gnu.
> > 
> > Thanks & Regards
> > Ajit
> > 
> > aarch64, middle-end: Move pair_fusion pass from aarch64 to middle-end
> > 
> > Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
> > to support multiple targets.
> > 
> > Common infrastructure of load store pair fusion is divided into
> > target independent and target dependent code.
> > 
> > Target independent code is structured in the following files.
> > gcc/pair-fusion.h
> > gcc/pair-fusion.cc
> > 
> > Target independent code is the Generic code with pure virtual
> > function to interface betwwen target independent and dependent
> > code.
> > 
> > 2024-05-20  Ajit Kumar Agarwal  <aagar...@linux.ibm.com>
> > 
> > gcc/ChangeLog:
> > 
> >     * pair-fusion.h: Generic header code for load store fusion
> 
> Insert "pair" before fusion?
> 
> >     that can be shared across different architectures.
> >     * pair-fusion.cc: Generic source code implementation for
> >     load store fusion that can be shared across different architectures.
> 
> Likewise.
> 
> >     * Makefile.in: Add new executable pair-fusion.o
> 
> It's not an executable but an object file.
> 
> >     * config/aarch64/aarch64-ldp-fusion.cc: Target specific
> >     code for load store fusion of aarch64.
> 
> I guess this should say something like: "Delete generic code and move it
> to pair-fusion.cc in the middle-end."
> 
> I've left some comments below on the header file.  The rest of the patch
> looks pretty good to me.  I tried diffing the original contents of
> aarch64-ldp-fusion.cc with pair-fusion.cc, and that looks as expected.
>


<snip>

> > diff --git a/gcc/pair-fusion.h b/gcc/pair-fusion.h
> > new file mode 100644
> > index 00000000000..00f6d3e149a
> > --- /dev/null
> > +++ b/gcc/pair-fusion.h
> > @@ -0,0 +1,340 @@
> > +// Pair Mem fusion generic header file.
> > +// Copyright (C) 2024 Free Software Foundation, Inc.
> > +//
> > +// This file is part of GCC.
> > +//
> > +// GCC is free software; you can redistribute it and/or modify it
> > +// under the terms of the GNU General Public License as published by
> > +// the Free Software Foundation; either version 3, or (at your option)
> > +// any later version.
> > +//
> > +// GCC is distributed in the hope that it will be useful, but
> > +// WITHOUT ANY WARRANTY; without even the implied warranty of
> > +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +// General Public License for more details.
> > +//
> > +// You should have received a copy of the GNU General Public License
> > +// along with GCC; see the file COPYING3.  If not see
> > +// <http://www.gnu.org/licenses/>.
> > +
> > +#define INCLUDE_ALGORITHM
> > +#define INCLUDE_FUNCTIONAL
> > +#define INCLUDE_LIST
> > +#define INCLUDE_TYPE_TRAITS
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "backend.h"
> > +#include "rtl.h"
> > +#include "df.h"
> > +#include "rtl-iter.h"
> > +#include "rtl-ssa.h"
> 
> I'm not sure how desirable this is, but you might be able to
> forward-declare RTL-SSA types like this:
> 
> class def_info;
> class insn_info;
> class insn_range_info;
> 
> thus removing the need to include the header here, since the interface
> only refers to these types by pointer or reference.
> 
> Richard: please say if you'd prefer keeping the include.
> 
> > +#include "cfgcleanup.h"
> > +#include "tree-pass.h"
> > +#include "ordered-hash-map.h"
> > +#include "tree-dfa.h"
> > +#include "fold-const.h"
> > +#include "tree-hash-traits.h"
> > +#include "print-tree.h"
> > +#include "insn-attr.h"
> 
> I expect we don't need all of these includes here.  I think we should
> have the minimum necessary set of includes here and most of the includes
> should be in the *.cc files.
> 
> > +
> > +using namespace rtl_ssa;
> > +
> > +// We pack these fields (load_p, fpsimd_p, and size) into an integer
> > +// (LFS) which we use as part of the key into the main hash tables.
> > +//
> > +// The idea is that we group candidates together only if they agree on
> > +// the fields below.  Candidates that disagree on any of these
> > +// properties shouldn't be merged together.
> > +struct lfs_fields
> > +{
> > +  bool load_p;
> > +  bool fpsimd_p;
> > +  unsigned size;
> > +};
> 
> This struct shouldn't be needed in the header file (it should only be
> needed in pair-fusion.cc).  I can see that it's needed for
> pair_fusion_bb_info, but that shouldn't be in the header file either IMO.
> 
> > +
> > +using insn_list_t = std::list<insn_info *>;
> 
> Likewise, if you move pair_fusion_bb_info out of this header, these
> shouldn't be needed either.
> 
> > +using insn_iter_t = insn_list_t::iterator;
> 
> Pre-existing, but this looks to be completely dead/unused.  Please could
> you push the obvious patch to delete it from aarch64-ldp-fusion.cc
> first (assuming that passes testing)?
> 
> > +
> > +// Information about the accesses at a given offset from a particular
> > +// base.  Stored in an access_group, see below.
> > +struct access_record
> > +{
> > +  poly_int64 offset;
> > +  std::list<insn_info *> cand_insns;
> > +  std::list<access_record>::iterator place;
> > +
> > +  access_record (poly_int64 off) : offset (off) {}
> > +};
> 
> If you drop pair_fusion_bb_info from this header then this can go back
> in pair-fusion.cc.
> 
> > +
> > +// A group of accesses where adjacent accesses could be ldp/stp
> > +// candidates.  The splay tree supports efficient insertion,
> > +// while the list supports efficient iteration.
> > +struct access_group
> > +{
> > +  splay_tree<access_record *> tree;
> > +  std::list<access_record> list;
> > +
> > +  template<typename Alloc>
> > +  inline void track (Alloc node_alloc, poly_int64 offset, insn_info *insn);
> > +};
> 
> Likewise.
> 
> > +
> > +// Information about a potential base candidate, used in try_fuse_pair.
> > +// There may be zero, one, or two viable RTL bases for a given pair.
> > +struct base_cand
> > +{
> > +  // DEF is the def of the base register to be used by the pair.
> > +  def_info *def;
> > +
> > +  // FROM_INSN is -1 if the base candidate is already shared by both
> > +  // candidate insns.  Otherwise it holds the index of the insn from
> > +  // which the base originated.
> > +  //
> > +  // In the case that the base is shared, either DEF is already used
> > +  // by both candidate accesses, or both accesses see different versions
> > +  // of the same regno, in which case DEF is the def consumed by the
> > +  // first candidate access.
> > +  int from_insn;
> > +
> > +  // To form a pair, we do so by moving the first access down and the 
> > second
> > +  // access up.  To determine where to form the pair, and whether or not
> > +  // it is safe to form the pair, we track instructions which cannot be
> > +  // re-ordered past due to either dataflow or alias hazards.
> > +  //
> > +  // Since we allow changing the base used by an access, the choice of
> > +  // base can change which instructions act as re-ordering hazards for
> > +  // this pair (due to different dataflow).  We store the initial
> > +  // dataflow hazards for this choice of base candidate in HAZARDS.
> > +  //
> > +  // These hazards act as re-ordering barriers to each candidate insn
> > +  // respectively, in program order.
> > +  //
> > +  // Later on, when we take alias analysis into account, we narrow
> > +  // HAZARDS accordingly.
> > +  insn_info *hazards[2];
> > +
> > +  base_cand (def_info *def, int insn)
> > +    : def (def), from_insn (insn), hazards {nullptr, nullptr} {}
> > +
> > +  base_cand (def_info *def) : base_cand (def, -1) {}
> > +
> > +  // Test if this base candidate is viable according to HAZARDS.
> > +  bool viable () const
> > +  {
> > +    return !hazards[0] || !hazards[1] || (*hazards[0] > *hazards[1]);
> > +  }
> > +};
> > +
> > +// Information about an alternate base.  For a def_info D, it may
> > +// instead be expressed as D = BASE + OFFSET.
> > +struct alt_base
> > +{
> > +  def_info *base;
> > +  poly_int64 offset;
> > +};
> 
> Likewise, this (alt_base) can be dropped from the header file.
> 
> > +
> > +// Virtual base class for load/store walkers used in alias analysis.
> > +struct alias_walker
> > +{
> > +  virtual bool conflict_p (int &budget) const = 0;
> > +  virtual insn_info *insn () const = 0;
> > +  virtual bool valid () const = 0;
> > +  virtual void advance () = 0;
> > +};
> 
> This is quite minor, but I think since this class is internal to
> pair-fusion.cc and the interface only uses pointers to this type,
> this can just be a forward-declaration (keeping the definition
> in pair-fusion.cc).
> 
> > +
> > +// When querying should_handle_writeback, this enum is used to
> > +// qualify which opportunities we are asking about.
> > +enum class writeback {
> > +  // Only those writeback opportunities that arise from existing
> > +  // auto-increment accesses.
> > +  EXISTING,
> > +
> > +  // All writeback opportunities, including those that involve folding
> > +  // base register updates into a non-writeback pair.
> > +  ALL
> > +};
> > +
> > +// This class can be overriden by targets to give a pass that fuses
> > +// adjacent loads and stores into load/store pair instructions.
> > +//
> > +// The target can override the various virtual functions to customize
> > +// the behaviour of the pass as appropriate for the target.
> > +struct pair_fusion {
> > +  pair_fusion ();
> > +
> > +  // Given:
> > +  // - an rtx REG_OP, the non-memory operand in a load/store insn,
> > +  // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and
> > +  // - a boolean LOAD_P (true iff the insn is a load), then:
> > +  // return true if the access should be considered an FP/SIMD access.
> > +  // Such accesses are segregated from GPR accesses, since we only want
> > +  // to form pairs for accesses that use the same register file.
> > +  virtual bool fpsimd_op_p (rtx, machine_mode, bool)
> > +  {
> > +    return false;
> > +  }
> > +
> > +  // Return true if we should consider forming pairs from memory
> > +  // accesses with operand mode MODE at this stage in compilation.
> > +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
> > +
> > +  // Return true iff REG_OP is a suitable register operand for a paired
> > +  // memory access, where LOAD_P is true if we're asking about loads and
> > +  // false for stores.  MODE gives the mode of the operand.
> > +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
> > +                                 machine_mode mode) = 0;
> > +
> > +  // Return alias check limit.
> > +  // This is needed to avoid unbounded quadratic behaviour when
> > +  // performing alias analysis.
> > +  virtual int pair_mem_alias_check_limit () = 0;
> > +
> > +  // Return true if we should try to handle writeback opportunities.
> > +  // WHICH determines the kinds of writeback opportunities the caller
> > +  // is asking about.
> > +  virtual bool should_handle_writeback (enum writeback which) = 0;
> > +
> > +  // Given BASE_MEM, the mem from the lower candidate access for a pair,
> > +  // and LOAD_P (true if the access is a load), check if we should proceed
> > +  // to form the pair given the target's code generation policy on
> > +  // paired accesses.
> > +  virtual bool pair_mem_ok_with_policy (rtx base_mem, bool load_p) = 0;
> > +
> > +  // Generate the pattern for a paired access.  PATS gives the patterns
> > +  // for the individual memory accesses (which by this point must share a
> > +  // common base register).  If WRITEBACK is non-NULL, then this rtx
> > +  // describes the update to the base register that should be performed by
> > +  // the resulting insn.  LOAD_P is true iff the accesses are loads.
> > +  virtual rtx gen_pair (rtx *pats, rtx writeback, bool load_p) = 0;
> > +
> > +  // Return true if INSN is a paired memory access.  If so, set LOAD_P to
> > +  // true iff INSN is a load pair.
> > +  virtual bool pair_mem_insn_p (rtx_insn *insn, bool &load_p) = 0;
> > +
> > +  // Return true if we should track loads.
> > +  virtual bool track_loads_p ()
> > +  {
> > +    return true;
> > +  }
> > +
> > +  // Return true if we should track stores.
> > +  virtual bool track_stores_p ()
> > +  {
> > +    return true;
> > +  }
> > +
> > +  // Return true if OFFSET is in range for a paired memory access.
> > +  virtual bool pair_mem_in_range_p (HOST_WIDE_INT offset) = 0;
> > +
> > +  // Given a load/store pair insn in PATTERN, unpack the insn, storing
> > +  // the register operands in REGS, and returning the mem.  LOAD_P is
> > +  // true for loads and false for stores.
> > +  virtual rtx destructure_pair (rtx regs[2], rtx pattern, bool load_p) = 0;
> > +
> > +  // Given a pair mem in MEM, register operands in REGS, and an rtx
> > +  // representing the effect of writeback on the base register in 
> > WB_EFFECT,
> > +  // return an insn representing a writeback variant of this pair.
> > +  // LOAD_P is true iff the pair is a load.
> > +  // This is used when promoting existing non-writeback pairs to writeback
> > +  // variants.
> > +  virtual rtx gen_promote_writeback_pair (rtx wb_effect, rtx mem,
> > +                                     rtx regs[2], bool load_p) = 0;
> > +
> > +  void process_block (bb_info *bb);
> > +  inline insn_info *find_trailing_add (insn_info *insns[2],
> > +                                  const insn_range_info &pair_range,
> > +                                  int initial_writeback,
> > +                                  rtx *writeback_effect,
> > +                                  def_info **add_def,
> > +                                  def_info *base_def,
> > +                                  poly_int64 initial_offset,
> > +                                  unsigned access_size);
> > +  inline int get_viable_bases (insn_info *insns[2],
> > +                          vec<base_cand> &base_cands,
> > +                          rtx cand_mems[2],
> > +                          unsigned access_size,
> > +                          bool reversed);
> > +  inline void do_alias_analysis (insn_info *alias_hazards[4],
> > +                            alias_walker *walkers[4],
> > +                            bool load_p);
> > +  inline void try_promote_writeback (insn_info *insn, bool load_p);
> > +  void run ();
> 
> Any reason why this patch drops the inline keyword here?

Nevermind, of course this is the main entry point for the pass which now
needs to be called from target code.  Sorry for the noise.

I guess we could mark process_block inline for consistency with the
other such member functions, though.

Thanks,
Alex

> 
> > +  ~pair_fusion ();
> > +};
> > +
> > +// State used by the pass for a given basic block.
> > +struct pair_fusion_bb_info
> > +{
> 
> As mentioned above, I think this whole class can be dropped from the
> header.
> 
> > +  using def_hash = nofree_ptr_hash<def_info>;
> > +  using expr_key_t = pair_hash<tree_operand_hash, int_hash<int, -1, -2>>;
> > +  using def_key_t = pair_hash<def_hash, int_hash<int, -1, -2>>;
> > +
> > +  // Map of <tree base, LFS> -> access_group.
> > +  ordered_hash_map<expr_key_t, access_group> expr_map;
> > +
> > +  // Map of <RTL-SSA def_info *, LFS> -> access_group.
> > +  ordered_hash_map<def_key_t, access_group> def_map;
> > +
> > +  // Given the def_info for an RTL base register, express it as an offset 
> > from
> > +  // some canonical base instead.
> > +  //
> > +  // Canonicalizing bases in this way allows us to identify adjacent 
> > accesses
> > +  // even if they see different base register defs.
> > +  hash_map<def_hash, alt_base> canon_base_map;
> > +
> > +  static const size_t obstack_alignment = sizeof (void *);
> > +
> > +  pair_fusion_bb_info (bb_info *bb, pair_fusion *d)
> > +    : m_bb (bb), m_pass (d), m_emitted_tombstone (false)
> > +  {
> > +    obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE,
> > +                           obstack_alignment, obstack_chunk_alloc,
> > +                           obstack_chunk_free);
> > +  }
> > +  ~pair_fusion_bb_info ()
> > +  {
> > +    obstack_free (&m_obstack, nullptr);
> > +
> > +    if (m_emitted_tombstone)
> > +      {
> > +   bitmap_release (&m_tombstone_bitmap);
> > +   bitmap_obstack_release (&m_bitmap_obstack);
> > +      }
> > +  }
> > +
> > +  inline void track_access (insn_info *, bool load, rtx mem);
> > +  inline void transform ();
> > +  inline void cleanup_tombstones ();
> > +
> > +private:
> > +  obstack m_obstack;
> > +  bb_info *m_bb;
> > +  pair_fusion *m_pass;
> > +
> > +  // State for keeping track of tombstone insns emitted for this BB.
> > +  bitmap_obstack m_bitmap_obstack;
> > +  bitmap_head m_tombstone_bitmap;
> > +  bool m_emitted_tombstone;
> > +
> > +  inline splay_tree_node<access_record *> *node_alloc (access_record *);
> > +
> > +  template<typename Map>
> > +  inline void traverse_base_map (Map &map);
> > +  inline void transform_for_base (int load_size, access_group &group);
> > +
> > +  inline void merge_pairs (insn_list_t &, insn_list_t &,
> > +                      bool load_p, unsigned access_size);
> > +
> > +  inline bool try_fuse_pair (bool load_p, unsigned access_size,
> > +                        insn_info *i1, insn_info *i2);
> > +
> > +  inline bool fuse_pair (bool load_p, unsigned access_size,
> > +                    int writeback,
> > +                    insn_info *i1, insn_info *i2,
> > +                    base_cand &base,
> > +                    const insn_range_info &move_range);
> > +
> > +  inline void track_tombstone (int uid);
> > +
> > +  inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs);
> > +};
> > -- 
> > 2.39.3
> > 
> 
> Thanks,
> Alex

Re: [Patch, aarch64, middle-end] Move pair_fusion pass from aarch64 to middle-end

Reply via email to