Hello Alex: On 21/05/24 6:02 pm, Alex Coplan wrote: > On 21/05/2024 16:02, Ajit Agarwal wrote: >> Hello Alex: >> >> On 21/05/24 1:16 am, Alex Coplan wrote: >>> On 20/05/2024 18:44, Alex Coplan wrote: >>>> Hi Ajit, >>>> >>>> On 20/05/2024 21:50, Ajit Agarwal wrote: >>>>> Hello Alex/Richard: >>>>> >>>>> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end >>>>> to support multiple targets. >>>>> >>>>> Common infrastructure of load store pair fusion is divided into >>>>> target independent and target dependent code. >>>>> >>>>> Target independent code is structured in the following files. >>>>> gcc/pair-fusion.h >>>>> gcc/pair-fusion.cc >>>>> >>>>> Target independent code is the Generic code with pure virtual >>>>> function to interface betwwen target independent and dependent >>>>> code. >>>>> >>>>> Bootstrapped and regtested on aarch64-linux-gnu. >>>>> >>>>> Thanks & Regards >>>>> Ajit >>>>> >>>>> aarch64, middle-end: Move pair_fusion pass from aarch64 to middle-end >>>>> >>>>> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end >>>>> to support multiple targets. >>>>> >>>>> Common infrastructure of load store pair fusion is divided into >>>>> target independent and target dependent code. >>>>> >>>>> Target independent code is structured in the following files. >>>>> gcc/pair-fusion.h >>>>> gcc/pair-fusion.cc >>>>> >>>>> Target independent code is the Generic code with pure virtual >>>>> function to interface betwwen target independent and dependent >>>>> code. >>>>> >>>>> 2024-05-20 Ajit Kumar Agarwal <aagar...@linux.ibm.com> >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> * pair-fusion.h: Generic header code for load store fusion >>>> >>>> Insert "pair" before fusion? >> >> Addressed in v1 of the patch. >>>> >>>>> that can be shared across different architectures. >>>>> * pair-fusion.cc: Generic source code implementation for >>>>> load store fusion that can be shared across different architectures. >>>> >>>> Likewise. >> Addressed in v1 of the patch. >>>> >>>>> * Makefile.in: Add new executable pair-fusion.o >>>> >>>> It's not an executable but an object file. >>>> >>>>> * config/aarch64/aarch64-ldp-fusion.cc: Target specific >>>>> code for load store fusion of aarch64. >>>> >>>> I guess this should say something like: "Delete generic code and move it >>>> to pair-fusion.cc in the middle-end." >>>> >>>> I've left some comments below on the header file. The rest of the patch >>>> looks pretty good to me. I tried diffing the original contents of >>>> aarch64-ldp-fusion.cc with pair-fusion.cc, and that looks as expected. >>>> >>> >>> <snip> >>> >>>>> diff --git a/gcc/pair-fusion.h b/gcc/pair-fusion.h >>>>> new file mode 100644 >>>>> index 00000000000..00f6d3e149a >>>>> --- /dev/null >>>>> +++ b/gcc/pair-fusion.h >>>>> @@ -0,0 +1,340 @@ >>>>> +// Pair Mem fusion generic header file. >>>>> +// Copyright (C) 2024 Free Software Foundation, Inc. >>>>> +// >>>>> +// This file is part of GCC. >>>>> +// >>>>> +// GCC is free software; you can redistribute it and/or modify it >>>>> +// under the terms of the GNU General Public License as published by >>>>> +// the Free Software Foundation; either version 3, or (at your option) >>>>> +// any later version. >>>>> +// >>>>> +// GCC is distributed in the hope that it will be useful, but >>>>> +// WITHOUT ANY WARRANTY; without even the implied warranty of >>>>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >>>>> +// General Public License for more details. >>>>> +// >>>>> +// You should have received a copy of the GNU General Public License >>>>> +// along with GCC; see the file COPYING3. If not see >>>>> +// <http://www.gnu.org/licenses/>. >>>>> + >>>>> +#define INCLUDE_ALGORITHM >>>>> +#define INCLUDE_FUNCTIONAL >>>>> +#define INCLUDE_LIST >>>>> +#define INCLUDE_TYPE_TRAITS >>>>> +#include "config.h" >>>>> +#include "system.h" >>>>> +#include "coretypes.h" >>>>> +#include "backend.h" >>>>> +#include "rtl.h" >>>>> +#include "df.h" >>>>> +#include "rtl-iter.h" >>>>> +#include "rtl-ssa.h" >>>> >>>> I'm not sure how desirable this is, but you might be able to >>>> forward-declare RTL-SSA types like this: >>>> >>>> class def_info; >>>> class insn_info; >>>> class insn_range_info; >>>> >>>> thus removing the need to include the header here, since the interface >>>> only refers to these types by pointer or reference. >>>> >>>> Richard: please say if you'd prefer keeping the include. >>>> >> >> Doing forward declaration gives ambigous errors with conflicting >> insn_info with rtl_ssa::insn_info and templated initialization >> errors. Also with overloaded operator with insn_info is not >> defined with forward declaration. > > So I tried this locally and it seems to work if you wrap the > forward-decls in: > > namespace rtl_ssa { > [...] > }; > > and indeed you'd need to move the definition of base_cand::viable () to > pair-fusion.cc so that we don't dereference those pointers in the > header (would be good to mark it inline if you do that). > > Btw, I noticed that the GCC coding conventions > (https://gcc.gnu.org/codingconventions.html) say: > >> Header files should have neither using directives nor namespace-scope >> using declarations. > > so we'll need to drop the using namespace rtl_ssa; from the header and > add it to (at least) pair-fusion.cc. > > I think we'll also need to drop all of the inlcudes (and #define INCLUDE_*) > from pair-fusion.h: Richard just reminded me that the GCC policy is that > header files shouldn't include other files if they also contain code. >
Addressed in v2 of the patch. > Thanks, > Alex Thanks & Regards Ajit > >> >> Hence I kept the header. >> >> Addressed in v1 of the patch. >> >>>>> +#include "cfgcleanup.h" >>>>> +#include "tree-pass.h" >>>>> +#include "ordered-hash-map.h" >>>>> +#include "tree-dfa.h" >>>>> +#include "fold-const.h" >>>>> +#include "tree-hash-traits.h" >>>>> +#include "print-tree.h" >>>>> +#include "insn-attr.h" >>>> >>>> I expect we don't need all of these includes here. I think we should >>>> have the minimum necessary set of includes here and most of the includes >>>> should be in the *.cc files. >>>> >> >> Addressed in v1 of the patch. >>>>> + >>>>> +using namespace rtl_ssa; >>>>> + >>>>> +// We pack these fields (load_p, fpsimd_p, and size) into an integer >>>>> +// (LFS) which we use as part of the key into the main hash tables. >>>>> +// >>>>> +// The idea is that we group candidates together only if they agree on >>>>> +// the fields below. Candidates that disagree on any of these >>>>> +// properties shouldn't be merged together. >>>>> +struct lfs_fields >>>>> +{ >>>>> + bool load_p; >>>>> + bool fpsimd_p; >>>>> + unsigned size; >>>>> +}; >>>> >>>> This struct shouldn't be needed in the header file (it should only be >>>> needed in pair-fusion.cc). I can see that it's needed for >>>> pair_fusion_bb_info, but that shouldn't be in the header file either IMO. >>>> >> >> Addressed in v1 of the patch. >>>>> + >>>>> +using insn_list_t = std::list<insn_info *>; >>>> >>>> Likewise, if you move pair_fusion_bb_info out of this header, these >>>> shouldn't be needed either. >>>> >>>>> +using insn_iter_t = insn_list_t::iterator; >>>> >>>> Pre-existing, but this looks to be completely dead/unused. Please could >>>> you push the obvious patch to delete it from aarch64-ldp-fusion.cc >>>> first (assuming that passes testing)? >>>> >>>>> + >>>>> +// Information about the accesses at a given offset from a particular >>>>> +// base. Stored in an access_group, see below. >>>>> +struct access_record >>>>> +{ >>>>> + poly_int64 offset; >>>>> + std::list<insn_info *> cand_insns; >>>>> + std::list<access_record>::iterator place; >>>>> + >>>>> + access_record (poly_int64 off) : offset (off) {} >>>>> +}; >>>> >>>> If you drop pair_fusion_bb_info from this header then this can go back >>>> in pair-fusion.cc. >>>> >> >> Addressed in v1 of the patch. >>>>> + >>>>> +// A group of accesses where adjacent accesses could be ldp/stp >>>>> +// candidates. The splay tree supports efficient insertion, >>>>> +// while the list supports efficient iteration. >>>>> +struct access_group >>>>> +{ >>>>> + splay_tree<access_record *> tree; >>>>> + std::list<access_record> list; >>>>> + >>>>> + template<typename Alloc> >>>>> + inline void track (Alloc node_alloc, poly_int64 offset, insn_info >>>>> *insn); >>>>> +}; >>>> >>>> Likewise. >>>> >> >> Addressed in v1 of the patch. >>>>> + >>>>> +// Information about a potential base candidate, used in try_fuse_pair. >>>>> +// There may be zero, one, or two viable RTL bases for a given pair. >>>>> +struct base_cand >>>>> +{ >>>>> + // DEF is the def of the base register to be used by the pair. >>>>> + def_info *def; >>>>> + >>>>> + // FROM_INSN is -1 if the base candidate is already shared by both >>>>> + // candidate insns. Otherwise it holds the index of the insn from >>>>> + // which the base originated. >>>>> + // >>>>> + // In the case that the base is shared, either DEF is already used >>>>> + // by both candidate accesses, or both accesses see different versions >>>>> + // of the same regno, in which case DEF is the def consumed by the >>>>> + // first candidate access. >>>>> + int from_insn; >>>>> + >>>>> + // To form a pair, we do so by moving the first access down and the >>>>> second >>>>> + // access up. To determine where to form the pair, and whether or not >>>>> + // it is safe to form the pair, we track instructions which cannot be >>>>> + // re-ordered past due to either dataflow or alias hazards. >>>>> + // >>>>> + // Since we allow changing the base used by an access, the choice of >>>>> + // base can change which instructions act as re-ordering hazards for >>>>> + // this pair (due to different dataflow). We store the initial >>>>> + // dataflow hazards for this choice of base candidate in HAZARDS. >>>>> + // >>>>> + // These hazards act as re-ordering barriers to each candidate insn >>>>> + // respectively, in program order. >>>>> + // >>>>> + // Later on, when we take alias analysis into account, we narrow >>>>> + // HAZARDS accordingly. >>>>> + insn_info *hazards[2]; >>>>> + >>>>> + base_cand (def_info *def, int insn) >>>>> + : def (def), from_insn (insn), hazards {nullptr, nullptr} {} >>>>> + >>>>> + base_cand (def_info *def) : base_cand (def, -1) {} >>>>> + >>>>> + // Test if this base candidate is viable according to HAZARDS. >>>>> + bool viable () const >>>>> + { >>>>> + return !hazards[0] || !hazards[1] || (*hazards[0] > *hazards[1]); >>>>> + } >>>>> +}; >>>>> + >>>>> +// Information about an alternate base. For a def_info D, it may >>>>> +// instead be expressed as D = BASE + OFFSET. >>>>> +struct alt_base >>>>> +{ >>>>> + def_info *base; >>>>> + poly_int64 offset; >>>>> +}; >>>> >>>> Likewise, this (alt_base) can be dropped from the header file. >>>> >> >> Addressed in v1 of the patch. >>>>> + >>>>> +// Virtual base class for load/store walkers used in alias analysis. >>>>> +struct alias_walker >>>>> +{ >>>>> + virtual bool conflict_p (int &budget) const = 0; >>>>> + virtual insn_info *insn () const = 0; >>>>> + virtual bool valid () const = 0; >>>>> + virtual void advance () = 0; >>>>> +}; >>>> >>>> This is quite minor, but I think since this class is internal to >>>> pair-fusion.cc and the interface only uses pointers to this type, >>>> this can just be a forward-declaration (keeping the definition >>>> in pair-fusion.cc). >>>> >>>>> + >>>>> +// When querying should_handle_writeback, this enum is used to >>>>> +// qualify which opportunities we are asking about. >>>>> +enum class writeback { >>>>> + // Only those writeback opportunities that arise from existing >>>>> + // auto-increment accesses. >>>>> + EXISTING, >>>>> + >>>>> + // All writeback opportunities, including those that involve folding >>>>> + // base register updates into a non-writeback pair. >>>>> + ALL >>>>> +}; >>>>> + >>>>> +// This class can be overriden by targets to give a pass that fuses >>>>> +// adjacent loads and stores into load/store pair instructions. >>>>> +// >>>>> +// The target can override the various virtual functions to customize >>>>> +// the behaviour of the pass as appropriate for the target. >>>>> +struct pair_fusion { >>>>> + pair_fusion (); >>>>> + >>>>> + // Given: >>>>> + // - an rtx REG_OP, the non-memory operand in a load/store insn, >>>>> + // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and >>>>> + // - a boolean LOAD_P (true iff the insn is a load), then: >>>>> + // return true if the access should be considered an FP/SIMD access. >>>>> + // Such accesses are segregated from GPR accesses, since we only want >>>>> + // to form pairs for accesses that use the same register file. >>>>> + virtual bool fpsimd_op_p (rtx, machine_mode, bool) >>>>> + { >>>>> + return false; >>>>> + } >>>>> + >>>>> + // Return true if we should consider forming pairs from memory >>>>> + // accesses with operand mode MODE at this stage in compilation. >>>>> + virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0; >>>>> + >>>>> + // Return true iff REG_OP is a suitable register operand for a paired >>>>> + // memory access, where LOAD_P is true if we're asking about loads and >>>>> + // false for stores. MODE gives the mode of the operand. >>>>> + virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op, >>>>> + machine_mode mode) = 0; >>>>> + >>>>> + // Return alias check limit. >>>>> + // This is needed to avoid unbounded quadratic behaviour when >>>>> + // performing alias analysis. >>>>> + virtual int pair_mem_alias_check_limit () = 0; >>>>> + >>>>> + // Return true if we should try to handle writeback opportunities. >>>>> + // WHICH determines the kinds of writeback opportunities the caller >>>>> + // is asking about. >>>>> + virtual bool should_handle_writeback (enum writeback which) = 0; >>>>> + >>>>> + // Given BASE_MEM, the mem from the lower candidate access for a pair, >>>>> + // and LOAD_P (true if the access is a load), check if we should >>>>> proceed >>>>> + // to form the pair given the target's code generation policy on >>>>> + // paired accesses. >>>>> + virtual bool pair_mem_ok_with_policy (rtx base_mem, bool load_p) = 0; >>>>> + >>>>> + // Generate the pattern for a paired access. PATS gives the patterns >>>>> + // for the individual memory accesses (which by this point must share a >>>>> + // common base register). If WRITEBACK is non-NULL, then this rtx >>>>> + // describes the update to the base register that should be performed >>>>> by >>>>> + // the resulting insn. LOAD_P is true iff the accesses are loads. >>>>> + virtual rtx gen_pair (rtx *pats, rtx writeback, bool load_p) = 0; >>>>> + >>>>> + // Return true if INSN is a paired memory access. If so, set LOAD_P to >>>>> + // true iff INSN is a load pair. >>>>> + virtual bool pair_mem_insn_p (rtx_insn *insn, bool &load_p) = 0; >>>>> + >>>>> + // Return true if we should track loads. >>>>> + virtual bool track_loads_p () >>>>> + { >>>>> + return true; >>>>> + } >>>>> + >>>>> + // Return true if we should track stores. >>>>> + virtual bool track_stores_p () >>>>> + { >>>>> + return true; >>>>> + } >>>>> + >>>>> + // Return true if OFFSET is in range for a paired memory access. >>>>> + virtual bool pair_mem_in_range_p (HOST_WIDE_INT offset) = 0; >>>>> + >>>>> + // Given a load/store pair insn in PATTERN, unpack the insn, storing >>>>> + // the register operands in REGS, and returning the mem. LOAD_P is >>>>> + // true for loads and false for stores. >>>>> + virtual rtx destructure_pair (rtx regs[2], rtx pattern, bool load_p) = >>>>> 0; >>>>> + >>>>> + // Given a pair mem in MEM, register operands in REGS, and an rtx >>>>> + // representing the effect of writeback on the base register in >>>>> WB_EFFECT, >>>>> + // return an insn representing a writeback variant of this pair. >>>>> + // LOAD_P is true iff the pair is a load. >>>>> + // This is used when promoting existing non-writeback pairs to >>>>> writeback >>>>> + // variants. >>>>> + virtual rtx gen_promote_writeback_pair (rtx wb_effect, rtx mem, >>>>> + rtx regs[2], bool load_p) = 0; >>>>> + >>>>> + void process_block (bb_info *bb); >>>>> + inline insn_info *find_trailing_add (insn_info *insns[2], >>>>> + const insn_range_info &pair_range, >>>>> + int initial_writeback, >>>>> + rtx *writeback_effect, >>>>> + def_info **add_def, >>>>> + def_info *base_def, >>>>> + poly_int64 initial_offset, >>>>> + unsigned access_size); >>>>> + inline int get_viable_bases (insn_info *insns[2], >>>>> + vec<base_cand> &base_cands, >>>>> + rtx cand_mems[2], >>>>> + unsigned access_size, >>>>> + bool reversed); >>>>> + inline void do_alias_analysis (insn_info *alias_hazards[4], >>>>> + alias_walker *walkers[4], >>>>> + bool load_p); >>>>> + inline void try_promote_writeback (insn_info *insn, bool load_p); >>>>> + void run (); >>>> >>>> Any reason why this patch drops the inline keyword here? >>> >>> Nevermind, of course this is the main entry point for the pass which now >>> needs to be called from target code. Sorry for the noise. >>> >>> I guess we could mark process_block inline for consistency with the >>> other such member functions, though. >>> >> >> Addressed in v1 of the patch. >>> Thanks, >>> Alex >>> >> >> Thanks & Regards >> Ajit >>>> >>>>> + ~pair_fusion (); >>>>> +}; >>>>> + >>>>> +// State used by the pass for a given basic block. >>>>> +struct pair_fusion_bb_info >>>>> +{ >>>> >>>> As mentioned above, I think this whole class can be dropped from the >>>> header. >>>> >>>>> + using def_hash = nofree_ptr_hash<def_info>; >>>>> + using expr_key_t = pair_hash<tree_operand_hash, int_hash<int, -1, -2>>; >>>>> + using def_key_t = pair_hash<def_hash, int_hash<int, -1, -2>>; >>>>> + >>>>> + // Map of <tree base, LFS> -> access_group. >>>>> + ordered_hash_map<expr_key_t, access_group> expr_map; >>>>> + >>>>> + // Map of <RTL-SSA def_info *, LFS> -> access_group. >>>>> + ordered_hash_map<def_key_t, access_group> def_map; >>>>> + >>>>> + // Given the def_info for an RTL base register, express it as an >>>>> offset from >>>>> + // some canonical base instead. >>>>> + // >>>>> + // Canonicalizing bases in this way allows us to identify adjacent >>>>> accesses >>>>> + // even if they see different base register defs. >>>>> + hash_map<def_hash, alt_base> canon_base_map; >>>>> + >>>>> + static const size_t obstack_alignment = sizeof (void *); >>>>> + >>>>> + pair_fusion_bb_info (bb_info *bb, pair_fusion *d) >>>>> + : m_bb (bb), m_pass (d), m_emitted_tombstone (false) >>>>> + { >>>>> + obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE, >>>>> + obstack_alignment, obstack_chunk_alloc, >>>>> + obstack_chunk_free); >>>>> + } >>>>> + ~pair_fusion_bb_info () >>>>> + { >>>>> + obstack_free (&m_obstack, nullptr); >>>>> + >>>>> + if (m_emitted_tombstone) >>>>> + { >>>>> + bitmap_release (&m_tombstone_bitmap); >>>>> + bitmap_obstack_release (&m_bitmap_obstack); >>>>> + } >>>>> + } >>>>> + >>>>> + inline void track_access (insn_info *, bool load, rtx mem); >>>>> + inline void transform (); >>>>> + inline void cleanup_tombstones (); >>>>> + >>>>> +private: >>>>> + obstack m_obstack; >>>>> + bb_info *m_bb; >>>>> + pair_fusion *m_pass; >>>>> + >>>>> + // State for keeping track of tombstone insns emitted for this BB. >>>>> + bitmap_obstack m_bitmap_obstack; >>>>> + bitmap_head m_tombstone_bitmap; >>>>> + bool m_emitted_tombstone; >>>>> + >>>>> + inline splay_tree_node<access_record *> *node_alloc (access_record *); >>>>> + >>>>> + template<typename Map> >>>>> + inline void traverse_base_map (Map &map); >>>>> + inline void transform_for_base (int load_size, access_group &group); >>>>> + >>>>> + inline void merge_pairs (insn_list_t &, insn_list_t &, >>>>> + bool load_p, unsigned access_size); >>>>> + >>>>> + inline bool try_fuse_pair (bool load_p, unsigned access_size, >>>>> + insn_info *i1, insn_info *i2); >>>>> + >>>>> + inline bool fuse_pair (bool load_p, unsigned access_size, >>>>> + int writeback, >>>>> + insn_info *i1, insn_info *i2, >>>>> + base_cand &base, >>>>> + const insn_range_info &move_range); >>>>> + >>>>> + inline void track_tombstone (int uid); >>>>> + >>>>> + inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs); >>>>> +}; >>>>> -- >>>>> 2.39.3 >>>>> >>>> >>>> Thanks, >>>> Alex >