On 21/05/2024 16:02, Ajit Agarwal wrote: > Hello Alex: > > On 21/05/24 1:16 am, Alex Coplan wrote: > > On 20/05/2024 18:44, Alex Coplan wrote: > >> Hi Ajit, > >> > >> On 20/05/2024 21:50, Ajit Agarwal wrote: > >>> Hello Alex/Richard: > >>> > >>> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end > >>> to support multiple targets. > >>> > >>> Common infrastructure of load store pair fusion is divided into > >>> target independent and target dependent code. > >>> > >>> Target independent code is structured in the following files. > >>> gcc/pair-fusion.h > >>> gcc/pair-fusion.cc > >>> > >>> Target independent code is the Generic code with pure virtual > >>> function to interface betwwen target independent and dependent > >>> code. > >>> > >>> Bootstrapped and regtested on aarch64-linux-gnu. > >>> > >>> Thanks & Regards > >>> Ajit > >>> > >>> aarch64, middle-end: Move pair_fusion pass from aarch64 to middle-end > >>> > >>> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end > >>> to support multiple targets. > >>> > >>> Common infrastructure of load store pair fusion is divided into > >>> target independent and target dependent code. > >>> > >>> Target independent code is structured in the following files. > >>> gcc/pair-fusion.h > >>> gcc/pair-fusion.cc > >>> > >>> Target independent code is the Generic code with pure virtual > >>> function to interface betwwen target independent and dependent > >>> code. > >>> > >>> 2024-05-20 Ajit Kumar Agarwal <aagar...@linux.ibm.com> > >>> > >>> gcc/ChangeLog: > >>> > >>> * pair-fusion.h: Generic header code for load store fusion > >> > >> Insert "pair" before fusion? > > Addressed in v1 of the patch. > >> > >>> that can be shared across different architectures. > >>> * pair-fusion.cc: Generic source code implementation for > >>> load store fusion that can be shared across different architectures. > >> > >> Likewise. > Addressed in v1 of the patch. > >> > >>> * Makefile.in: Add new executable pair-fusion.o > >> > >> It's not an executable but an object file. > >> > >>> * config/aarch64/aarch64-ldp-fusion.cc: Target specific > >>> code for load store fusion of aarch64. > >> > >> I guess this should say something like: "Delete generic code and move it > >> to pair-fusion.cc in the middle-end." > >> > >> I've left some comments below on the header file. The rest of the patch > >> looks pretty good to me. I tried diffing the original contents of > >> aarch64-ldp-fusion.cc with pair-fusion.cc, and that looks as expected. > >> > > > > <snip> > > > >>> diff --git a/gcc/pair-fusion.h b/gcc/pair-fusion.h > >>> new file mode 100644 > >>> index 00000000000..00f6d3e149a > >>> --- /dev/null > >>> +++ b/gcc/pair-fusion.h > >>> @@ -0,0 +1,340 @@ > >>> +// Pair Mem fusion generic header file. > >>> +// Copyright (C) 2024 Free Software Foundation, Inc. > >>> +// > >>> +// This file is part of GCC. > >>> +// > >>> +// GCC is free software; you can redistribute it and/or modify it > >>> +// under the terms of the GNU General Public License as published by > >>> +// the Free Software Foundation; either version 3, or (at your option) > >>> +// any later version. > >>> +// > >>> +// GCC is distributed in the hope that it will be useful, but > >>> +// WITHOUT ANY WARRANTY; without even the implied warranty of > >>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > >>> +// General Public License for more details. > >>> +// > >>> +// You should have received a copy of the GNU General Public License > >>> +// along with GCC; see the file COPYING3. If not see > >>> +// <http://www.gnu.org/licenses/>. > >>> + > >>> +#define INCLUDE_ALGORITHM > >>> +#define INCLUDE_FUNCTIONAL > >>> +#define INCLUDE_LIST > >>> +#define INCLUDE_TYPE_TRAITS > >>> +#include "config.h" > >>> +#include "system.h" > >>> +#include "coretypes.h" > >>> +#include "backend.h" > >>> +#include "rtl.h" > >>> +#include "df.h" > >>> +#include "rtl-iter.h" > >>> +#include "rtl-ssa.h" > >> > >> I'm not sure how desirable this is, but you might be able to > >> forward-declare RTL-SSA types like this: > >> > >> class def_info; > >> class insn_info; > >> class insn_range_info; > >> > >> thus removing the need to include the header here, since the interface > >> only refers to these types by pointer or reference. > >> > >> Richard: please say if you'd prefer keeping the include. > >> > > Doing forward declaration gives ambigous errors with conflicting > insn_info with rtl_ssa::insn_info and templated initialization > errors. Also with overloaded operator with insn_info is not > defined with forward declaration.
So I tried this locally and it seems to work if you wrap the forward-decls in: namespace rtl_ssa { [...] }; and indeed you'd need to move the definition of base_cand::viable () to pair-fusion.cc so that we don't dereference those pointers in the header (would be good to mark it inline if you do that). Btw, I noticed that the GCC coding conventions (https://gcc.gnu.org/codingconventions.html) say: > Header files should have neither using directives nor namespace-scope > using declarations. so we'll need to drop the using namespace rtl_ssa; from the header and add it to (at least) pair-fusion.cc. I think we'll also need to drop all of the inlcudes (and #define INCLUDE_*) from pair-fusion.h: Richard just reminded me that the GCC policy is that header files shouldn't include other files if they also contain code. Thanks, Alex > > Hence I kept the header. > > Addressed in v1 of the patch. > > >>> +#include "cfgcleanup.h" > >>> +#include "tree-pass.h" > >>> +#include "ordered-hash-map.h" > >>> +#include "tree-dfa.h" > >>> +#include "fold-const.h" > >>> +#include "tree-hash-traits.h" > >>> +#include "print-tree.h" > >>> +#include "insn-attr.h" > >> > >> I expect we don't need all of these includes here. I think we should > >> have the minimum necessary set of includes here and most of the includes > >> should be in the *.cc files. > >> > > Addressed in v1 of the patch. > >>> + > >>> +using namespace rtl_ssa; > >>> + > >>> +// We pack these fields (load_p, fpsimd_p, and size) into an integer > >>> +// (LFS) which we use as part of the key into the main hash tables. > >>> +// > >>> +// The idea is that we group candidates together only if they agree on > >>> +// the fields below. Candidates that disagree on any of these > >>> +// properties shouldn't be merged together. > >>> +struct lfs_fields > >>> +{ > >>> + bool load_p; > >>> + bool fpsimd_p; > >>> + unsigned size; > >>> +}; > >> > >> This struct shouldn't be needed in the header file (it should only be > >> needed in pair-fusion.cc). I can see that it's needed for > >> pair_fusion_bb_info, but that shouldn't be in the header file either IMO. > >> > > Addressed in v1 of the patch. > >>> + > >>> +using insn_list_t = std::list<insn_info *>; > >> > >> Likewise, if you move pair_fusion_bb_info out of this header, these > >> shouldn't be needed either. > >> > >>> +using insn_iter_t = insn_list_t::iterator; > >> > >> Pre-existing, but this looks to be completely dead/unused. Please could > >> you push the obvious patch to delete it from aarch64-ldp-fusion.cc > >> first (assuming that passes testing)? > >> > >>> + > >>> +// Information about the accesses at a given offset from a particular > >>> +// base. Stored in an access_group, see below. > >>> +struct access_record > >>> +{ > >>> + poly_int64 offset; > >>> + std::list<insn_info *> cand_insns; > >>> + std::list<access_record>::iterator place; > >>> + > >>> + access_record (poly_int64 off) : offset (off) {} > >>> +}; > >> > >> If you drop pair_fusion_bb_info from this header then this can go back > >> in pair-fusion.cc. > >> > > Addressed in v1 of the patch. > >>> + > >>> +// A group of accesses where adjacent accesses could be ldp/stp > >>> +// candidates. The splay tree supports efficient insertion, > >>> +// while the list supports efficient iteration. > >>> +struct access_group > >>> +{ > >>> + splay_tree<access_record *> tree; > >>> + std::list<access_record> list; > >>> + > >>> + template<typename Alloc> > >>> + inline void track (Alloc node_alloc, poly_int64 offset, insn_info > >>> *insn); > >>> +}; > >> > >> Likewise. > >> > > Addressed in v1 of the patch. > >>> + > >>> +// Information about a potential base candidate, used in try_fuse_pair. > >>> +// There may be zero, one, or two viable RTL bases for a given pair. > >>> +struct base_cand > >>> +{ > >>> + // DEF is the def of the base register to be used by the pair. > >>> + def_info *def; > >>> + > >>> + // FROM_INSN is -1 if the base candidate is already shared by both > >>> + // candidate insns. Otherwise it holds the index of the insn from > >>> + // which the base originated. > >>> + // > >>> + // In the case that the base is shared, either DEF is already used > >>> + // by both candidate accesses, or both accesses see different versions > >>> + // of the same regno, in which case DEF is the def consumed by the > >>> + // first candidate access. > >>> + int from_insn; > >>> + > >>> + // To form a pair, we do so by moving the first access down and the > >>> second > >>> + // access up. To determine where to form the pair, and whether or not > >>> + // it is safe to form the pair, we track instructions which cannot be > >>> + // re-ordered past due to either dataflow or alias hazards. > >>> + // > >>> + // Since we allow changing the base used by an access, the choice of > >>> + // base can change which instructions act as re-ordering hazards for > >>> + // this pair (due to different dataflow). We store the initial > >>> + // dataflow hazards for this choice of base candidate in HAZARDS. > >>> + // > >>> + // These hazards act as re-ordering barriers to each candidate insn > >>> + // respectively, in program order. > >>> + // > >>> + // Later on, when we take alias analysis into account, we narrow > >>> + // HAZARDS accordingly. > >>> + insn_info *hazards[2]; > >>> + > >>> + base_cand (def_info *def, int insn) > >>> + : def (def), from_insn (insn), hazards {nullptr, nullptr} {} > >>> + > >>> + base_cand (def_info *def) : base_cand (def, -1) {} > >>> + > >>> + // Test if this base candidate is viable according to HAZARDS. > >>> + bool viable () const > >>> + { > >>> + return !hazards[0] || !hazards[1] || (*hazards[0] > *hazards[1]); > >>> + } > >>> +}; > >>> + > >>> +// Information about an alternate base. For a def_info D, it may > >>> +// instead be expressed as D = BASE + OFFSET. > >>> +struct alt_base > >>> +{ > >>> + def_info *base; > >>> + poly_int64 offset; > >>> +}; > >> > >> Likewise, this (alt_base) can be dropped from the header file. > >> > > Addressed in v1 of the patch. > >>> + > >>> +// Virtual base class for load/store walkers used in alias analysis. > >>> +struct alias_walker > >>> +{ > >>> + virtual bool conflict_p (int &budget) const = 0; > >>> + virtual insn_info *insn () const = 0; > >>> + virtual bool valid () const = 0; > >>> + virtual void advance () = 0; > >>> +}; > >> > >> This is quite minor, but I think since this class is internal to > >> pair-fusion.cc and the interface only uses pointers to this type, > >> this can just be a forward-declaration (keeping the definition > >> in pair-fusion.cc). > >> > >>> + > >>> +// When querying should_handle_writeback, this enum is used to > >>> +// qualify which opportunities we are asking about. > >>> +enum class writeback { > >>> + // Only those writeback opportunities that arise from existing > >>> + // auto-increment accesses. > >>> + EXISTING, > >>> + > >>> + // All writeback opportunities, including those that involve folding > >>> + // base register updates into a non-writeback pair. > >>> + ALL > >>> +}; > >>> + > >>> +// This class can be overriden by targets to give a pass that fuses > >>> +// adjacent loads and stores into load/store pair instructions. > >>> +// > >>> +// The target can override the various virtual functions to customize > >>> +// the behaviour of the pass as appropriate for the target. > >>> +struct pair_fusion { > >>> + pair_fusion (); > >>> + > >>> + // Given: > >>> + // - an rtx REG_OP, the non-memory operand in a load/store insn, > >>> + // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and > >>> + // - a boolean LOAD_P (true iff the insn is a load), then: > >>> + // return true if the access should be considered an FP/SIMD access. > >>> + // Such accesses are segregated from GPR accesses, since we only want > >>> + // to form pairs for accesses that use the same register file. > >>> + virtual bool fpsimd_op_p (rtx, machine_mode, bool) > >>> + { > >>> + return false; > >>> + } > >>> + > >>> + // Return true if we should consider forming pairs from memory > >>> + // accesses with operand mode MODE at this stage in compilation. > >>> + virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0; > >>> + > >>> + // Return true iff REG_OP is a suitable register operand for a paired > >>> + // memory access, where LOAD_P is true if we're asking about loads and > >>> + // false for stores. MODE gives the mode of the operand. > >>> + virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op, > >>> + machine_mode mode) = 0; > >>> + > >>> + // Return alias check limit. > >>> + // This is needed to avoid unbounded quadratic behaviour when > >>> + // performing alias analysis. > >>> + virtual int pair_mem_alias_check_limit () = 0; > >>> + > >>> + // Return true if we should try to handle writeback opportunities. > >>> + // WHICH determines the kinds of writeback opportunities the caller > >>> + // is asking about. > >>> + virtual bool should_handle_writeback (enum writeback which) = 0; > >>> + > >>> + // Given BASE_MEM, the mem from the lower candidate access for a pair, > >>> + // and LOAD_P (true if the access is a load), check if we should > >>> proceed > >>> + // to form the pair given the target's code generation policy on > >>> + // paired accesses. > >>> + virtual bool pair_mem_ok_with_policy (rtx base_mem, bool load_p) = 0; > >>> + > >>> + // Generate the pattern for a paired access. PATS gives the patterns > >>> + // for the individual memory accesses (which by this point must share a > >>> + // common base register). If WRITEBACK is non-NULL, then this rtx > >>> + // describes the update to the base register that should be performed > >>> by > >>> + // the resulting insn. LOAD_P is true iff the accesses are loads. > >>> + virtual rtx gen_pair (rtx *pats, rtx writeback, bool load_p) = 0; > >>> + > >>> + // Return true if INSN is a paired memory access. If so, set LOAD_P to > >>> + // true iff INSN is a load pair. > >>> + virtual bool pair_mem_insn_p (rtx_insn *insn, bool &load_p) = 0; > >>> + > >>> + // Return true if we should track loads. > >>> + virtual bool track_loads_p () > >>> + { > >>> + return true; > >>> + } > >>> + > >>> + // Return true if we should track stores. > >>> + virtual bool track_stores_p () > >>> + { > >>> + return true; > >>> + } > >>> + > >>> + // Return true if OFFSET is in range for a paired memory access. > >>> + virtual bool pair_mem_in_range_p (HOST_WIDE_INT offset) = 0; > >>> + > >>> + // Given a load/store pair insn in PATTERN, unpack the insn, storing > >>> + // the register operands in REGS, and returning the mem. LOAD_P is > >>> + // true for loads and false for stores. > >>> + virtual rtx destructure_pair (rtx regs[2], rtx pattern, bool load_p) = > >>> 0; > >>> + > >>> + // Given a pair mem in MEM, register operands in REGS, and an rtx > >>> + // representing the effect of writeback on the base register in > >>> WB_EFFECT, > >>> + // return an insn representing a writeback variant of this pair. > >>> + // LOAD_P is true iff the pair is a load. > >>> + // This is used when promoting existing non-writeback pairs to > >>> writeback > >>> + // variants. > >>> + virtual rtx gen_promote_writeback_pair (rtx wb_effect, rtx mem, > >>> + rtx regs[2], bool load_p) = 0; > >>> + > >>> + void process_block (bb_info *bb); > >>> + inline insn_info *find_trailing_add (insn_info *insns[2], > >>> + const insn_range_info &pair_range, > >>> + int initial_writeback, > >>> + rtx *writeback_effect, > >>> + def_info **add_def, > >>> + def_info *base_def, > >>> + poly_int64 initial_offset, > >>> + unsigned access_size); > >>> + inline int get_viable_bases (insn_info *insns[2], > >>> + vec<base_cand> &base_cands, > >>> + rtx cand_mems[2], > >>> + unsigned access_size, > >>> + bool reversed); > >>> + inline void do_alias_analysis (insn_info *alias_hazards[4], > >>> + alias_walker *walkers[4], > >>> + bool load_p); > >>> + inline void try_promote_writeback (insn_info *insn, bool load_p); > >>> + void run (); > >> > >> Any reason why this patch drops the inline keyword here? > > > > Nevermind, of course this is the main entry point for the pass which now > > needs to be called from target code. Sorry for the noise. > > > > I guess we could mark process_block inline for consistency with the > > other such member functions, though. > > > > Addressed in v1 of the patch. > > Thanks, > > Alex > > > > Thanks & Regards > Ajit > >> > >>> + ~pair_fusion (); > >>> +}; > >>> + > >>> +// State used by the pass for a given basic block. > >>> +struct pair_fusion_bb_info > >>> +{ > >> > >> As mentioned above, I think this whole class can be dropped from the > >> header. > >> > >>> + using def_hash = nofree_ptr_hash<def_info>; > >>> + using expr_key_t = pair_hash<tree_operand_hash, int_hash<int, -1, -2>>; > >>> + using def_key_t = pair_hash<def_hash, int_hash<int, -1, -2>>; > >>> + > >>> + // Map of <tree base, LFS> -> access_group. > >>> + ordered_hash_map<expr_key_t, access_group> expr_map; > >>> + > >>> + // Map of <RTL-SSA def_info *, LFS> -> access_group. > >>> + ordered_hash_map<def_key_t, access_group> def_map; > >>> + > >>> + // Given the def_info for an RTL base register, express it as an > >>> offset from > >>> + // some canonical base instead. > >>> + // > >>> + // Canonicalizing bases in this way allows us to identify adjacent > >>> accesses > >>> + // even if they see different base register defs. > >>> + hash_map<def_hash, alt_base> canon_base_map; > >>> + > >>> + static const size_t obstack_alignment = sizeof (void *); > >>> + > >>> + pair_fusion_bb_info (bb_info *bb, pair_fusion *d) > >>> + : m_bb (bb), m_pass (d), m_emitted_tombstone (false) > >>> + { > >>> + obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE, > >>> + obstack_alignment, obstack_chunk_alloc, > >>> + obstack_chunk_free); > >>> + } > >>> + ~pair_fusion_bb_info () > >>> + { > >>> + obstack_free (&m_obstack, nullptr); > >>> + > >>> + if (m_emitted_tombstone) > >>> + { > >>> + bitmap_release (&m_tombstone_bitmap); > >>> + bitmap_obstack_release (&m_bitmap_obstack); > >>> + } > >>> + } > >>> + > >>> + inline void track_access (insn_info *, bool load, rtx mem); > >>> + inline void transform (); > >>> + inline void cleanup_tombstones (); > >>> + > >>> +private: > >>> + obstack m_obstack; > >>> + bb_info *m_bb; > >>> + pair_fusion *m_pass; > >>> + > >>> + // State for keeping track of tombstone insns emitted for this BB. > >>> + bitmap_obstack m_bitmap_obstack; > >>> + bitmap_head m_tombstone_bitmap; > >>> + bool m_emitted_tombstone; > >>> + > >>> + inline splay_tree_node<access_record *> *node_alloc (access_record *); > >>> + > >>> + template<typename Map> > >>> + inline void traverse_base_map (Map &map); > >>> + inline void transform_for_base (int load_size, access_group &group); > >>> + > >>> + inline void merge_pairs (insn_list_t &, insn_list_t &, > >>> + bool load_p, unsigned access_size); > >>> + > >>> + inline bool try_fuse_pair (bool load_p, unsigned access_size, > >>> + insn_info *i1, insn_info *i2); > >>> + > >>> + inline bool fuse_pair (bool load_p, unsigned access_size, > >>> + int writeback, > >>> + insn_info *i1, insn_info *i2, > >>> + base_cand &base, > >>> + const insn_range_info &move_range); > >>> + > >>> + inline void track_tombstone (int uid); > >>> + > >>> + inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs); > >>> +}; > >>> -- > >>> 2.39.3 > >>> > >> > >> Thanks, > >> Alex