Hi Ajit, You need to remove the header dependencies that are no longer required for aarch64-ldp-fusion.o in t-aarch64 (not forgetting to update the ChangeLog). A few other minor nits below.
LGTM with those changes, but you'll need Richard S to approve. Thanks a lot for doing this. On 22/05/2024 00:16, Ajit Agarwal wrote: > Hello Alex/Richard: > > All comments are addressed. > > Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end > to support multiple targets. > > Common infrastructure of load store pair fusion is divided into > target independent and target dependent code. > > Target independent code is structured in the following files. > gcc/pair-fusion.h > gcc/pair-fusion.cc > > Target independent code is the Generic code with pure virtual > function to interface betwwen target independent and dependent > code. > > Bootstrapped and regtested on aarch64-linux-gnu. > > Thanks & Regards > Ajit > > > > aarch64, middle-end: Move pair_fusion pass from aarch64 to middle-end > > Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end > to support multiple targets. > > Common infrastructure of load store pair fusion is divided into > target independent and target dependent code. > > Target independent code is structured in the following files. > gcc/pair-fusion.h > gcc/pair-fusion.cc > > Target independent code is the Generic code with pure virtual > function to interface betwwen target independent and dependent > code. > > 2024-05-22 Ajit Kumar Agarwal <aagar...@linux.ibm.com> > > gcc/ChangeLog: > > * pair-fusion.h: Generic header code for load store pair fusion > that can be shared across different architectures. > * pair-fusion.cc: Generic source code implementation for > load store pair fusion that can be shared across different > architectures. > * Makefile.in: Add new object file pair-fusion.o. > * config/aarch64/aarch64-ldp-fusion.cc: Delete generic code and move it > to pair-fusion.cc in the middle-end. > * config/aarch64/t-aarch64: Add header file dependency on pair-fusion.h. > --- > gcc/Makefile.in | 1 + > gcc/config/aarch64/aarch64-ldp-fusion.cc | 3298 +--------------------- > gcc/config/aarch64/t-aarch64 | 2 +- > gcc/pair-fusion.cc | 3013 ++++++++++++++++++++ > gcc/pair-fusion.h | 193 ++ > 5 files changed, 3286 insertions(+), 3221 deletions(-) > create mode 100644 gcc/pair-fusion.cc > create mode 100644 gcc/pair-fusion.h > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > index a7f15694c34..643342f623d 100644 > --- a/gcc/Makefile.in > +++ b/gcc/Makefile.in > @@ -1563,6 +1563,7 @@ OBJS = \ > ipa-strub.o \ > ipa.o \ > ira.o \ > + pair-fusion.o \ > ira-build.o \ > ira-costs.o \ > ira-conflicts.o \ > diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc > b/gcc/config/aarch64/aarch64-ldp-fusion.cc > index 085366cdf68..0af927231d3 100644 > --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc > +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc <snip> > diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64 > index 78713558e7d..bdada08be70 100644 > --- a/gcc/config/aarch64/t-aarch64 > +++ b/gcc/config/aarch64/t-aarch64 > @@ -203,7 +203,7 @@ aarch64-early-ra.o: > $(srcdir)/config/aarch64/aarch64-early-ra.cc \ > aarch64-ldp-fusion.o: $(srcdir)/config/aarch64/aarch64-ldp-fusion.cc \ > $(CONFIG_H) $(SYSTEM_H) $(CORETYPES_H) $(BACKEND_H) $(RTL_H) $(DF_H) \ > $(RTL_SSA_H) cfgcleanup.h tree-pass.h ordered-hash-map.h tree-dfa.h \ > - fold-const.h tree-hash-traits.h print-tree.h > + fold-const.h tree-hash-traits.h print-tree.h pair-fusion.h So now you also need to remove the deps on the includes removed in the latest version of the patch. > $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \ > $(srcdir)/config/aarch64/aarch64-ldp-fusion.cc > > diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc > new file mode 100644 > index 00000000000..827b88cf2fc > --- /dev/null > +++ b/gcc/pair-fusion.cc > @@ -0,0 +1,3013 @@ > +// Pass to fuse adjacent loads/stores into paired memory accesses. > +// Copyright (C) 2024 Free Software Foundation, Inc. > +// > +// This file is part of GCC. > +// > +// GCC is free software; you can redistribute it and/or modify it > +// under the terms of the GNU General Public License as published by > +// the Free Software Foundation; either version 3, or (at your option) > +// any later version. > +// > +// GCC is distributed in the hope that it will be useful, but > +// WITHOUT ANY WARRANTY; without even the implied warranty of > +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +// General Public License for more details. > +// > +// You should have received a copy of the GNU General Public License > +// along with GCC; see the file COPYING3. If not see > +// <http://www.gnu.org/licenses/>. > + > +#define INCLUDE_ALGORITHM > +#define INCLUDE_FUNCTIONAL > +#define INCLUDE_LIST > +#define INCLUDE_TYPE_TRAITS > +#include "config.h" > +#include "system.h" > +#include "coretypes.h" > +#include "backend.h" > +#include "rtl.h" > +#include "df.h" > +#include "rtl-iter.h" > +#include "rtl-ssa.h" > +#include "cfgcleanup.h" > +#include "tree-pass.h" > +#include "ordered-hash-map.h" > +#include "tree-dfa.h" > +#include "fold-const.h" > +#include "tree-hash-traits.h" > +#include "print-tree.h" > + > +using namespace rtl_ssa; > + > +#include "pair-fusion.h" Now that you've fixed up the header file, I think it would be more natural to keep the includes together and have the using namespace come after all the includes. > + > +// We pack these fields (load_p, fpsimd_p, and size) into an integer > +// (LFS) which we use as part of the key into the main hash tables. > +// > +// The idea is that we group candidates together only if they agree on > +// the fields below. Candidates that disagree on any of these > +// properties shouldn't be merged together. > +struct lfs_fields > +{ > + bool load_p; > + bool fpsimd_p; > + unsigned size; > +}; > + <snip> > diff --git a/gcc/pair-fusion.h b/gcc/pair-fusion.h > new file mode 100644 > index 00000000000..846672959a9 > --- /dev/null > +++ b/gcc/pair-fusion.h > @@ -0,0 +1,193 @@ > +// Pass to fuse adjacent loads/stores into paired memory accesses. > +// > +// This file contains the definition of the virtual base class which is > +// overriden by targets that make use of the pass. Nit: probably want a blank line here. > +// Copyright (C) 2024 Free Software Foundation, Inc. > +// > +// This file is part of GCC. > +// > +// GCC is free software; you can redistribute it and/or modify it > +// under the terms of the GNU General Public License as published by > +// the Free Software Foundation; either version 3, or (at your option) > +// any later version. > +// > +// GCC is distributed in the hope that it will be useful, but > +// WITHOUT ANY WARRANTY; without even the implied warranty of > +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +// General Public License for more details. > +// > +// You should have received a copy of the GNU General Public License > +// along with GCC; see the file COPYING3. If not see > +// <http://www.gnu.org/licenses/>. > + > +namespace rtl_ssa { > + class def_info; > + class insn_info; > + class insn_range_info; > + class bb_info; > +} > + > +// Information about a potential base candidate, used in try_fuse_pair. > +// There may be zero, one, or two viable RTL bases for a given pair. > +struct base_cand > +{ > + // DEF is the def of the base register to be used by the pair. > + rtl_ssa::def_info *def; > + > + // FROM_INSN is -1 if the base candidate is already shared by both > + // candidate insns. Otherwise it holds the index of the insn from > + // which the base originated. > + // > + // In the case that the base is shared, either DEF is already used > + // by both candidate accesses, or both accesses see different versions > + // of the same regno, in which case DEF is the def consumed by the > + // first candidate access. > + int from_insn; > + > + // To form a pair, we do so by moving the first access down and the second > + // access up. To determine where to form the pair, and whether or not > + // it is safe to form the pair, we track instructions which cannot be > + // re-ordered past due to either dataflow or alias hazards. > + // > + // Since we allow changing the base used by an access, the choice of > + // base can change which instructions act as re-ordering hazards for > + // this pair (due to different dataflow). We store the initial > + // dataflow hazards for this choice of base candidate in HAZARDS. > + // > + // These hazards act as re-ordering barriers to each candidate insn > + // respectively, in program order. > + // > + // Later on, when we take alias analysis into account, we narrow > + // HAZARDS accordingly. > + rtl_ssa::insn_info *hazards[2]; > + > + base_cand (rtl_ssa::def_info *def, int insn) > + : def (def), from_insn (insn), hazards {nullptr, nullptr} {} > + > + base_cand (rtl_ssa::def_info *def) : base_cand (def, -1) {} > + > + // Test if this base candidate is viable according to HAZARDS. > + inline bool viable () const; > +}; > + > +struct alias_walker; > + > +// When querying should_handle_writeback, this enum is used to > +// qualify which opportunities we are asking about. > +enum class writeback { > + // Only those writeback opportunities that arise from existing > + // auto-increment accesses. > + EXISTING, > + > + // All writeback opportunities, including those that involve folding > + // base register updates into a non-writeback pair. > + ALL > +}; > + > +// This class can be overriden by targets to give a pass that fuses > +// adjacent loads and stores into load/store pair instructions. > +// > +// The target can override the various virtual functions to customize > +// the behaviour of the pass as appropriate for the target. > +struct pair_fusion { > + pair_fusion (); > + > + // Given: > + // - an rtx REG_OP, the non-memory operand in a load/store insn, > + // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and > + // - a boolean LOAD_P (true iff the insn is a load), then: > + // return true if the access should be considered an FP/SIMD access. > + // Such accesses are segregated from GPR accesses, since we only want > + // to form pairs for accesses that use the same register file. > + virtual bool fpsimd_op_p (rtx, machine_mode, bool) > + { > + return false; > + } > + > + // Return true if we should consider forming pairs from memory > + // accesses with operand mode MODE at this stage in compilation. > + virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0; > + > + // Return true iff REG_OP is a suitable register operand for a paired > + // memory access, where LOAD_P is true if we're asking about loads and > + // false for stores. MODE gives the mode of the operand. > + virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op, > + machine_mode mode) = 0; > + > + // Return alias check limit. > + // This is needed to avoid unbounded quadratic behaviour when > + // performing alias analysis. > + virtual int pair_mem_alias_check_limit () = 0; > + > + // Return true if we should try to handle writeback opportunities. > + // WHICH determines the kinds of writeback opportunities the caller > + // is asking about. > + virtual bool should_handle_writeback (enum writeback which) = 0; > + > + // Given BASE_MEM, the mem from the lower candidate access for a pair, > + // and LOAD_P (true if the access is a load), check if we should proceed > + // to form the pair given the target's code generation policy on > + // paired accesses. > + virtual bool pair_mem_ok_with_policy (rtx base_mem, bool load_p) = 0; > + > + // Generate the pattern for a paired access. PATS gives the patterns > + // for the individual memory accesses (which by this point must share a > + // common base register). If WRITEBACK is non-NULL, then this rtx > + // describes the update to the base register that should be performed by > + // the resulting insn. LOAD_P is true iff the accesses are loads. > + virtual rtx gen_pair (rtx *pats, rtx writeback, bool load_p) = 0; > + > + // Return true if INSN is a paired memory access. If so, set LOAD_P to > + // true iff INSN is a load pair. > + virtual bool pair_mem_insn_p (rtx_insn *insn, bool &load_p) = 0; > + > + // Return true if we should track loads. > + virtual bool track_loads_p () > + { > + return true; > + } > + > + // Return true if we should track stores. > + virtual bool track_stores_p () > + { > + return true; > + } > + > + // Return true if OFFSET is in range for a paired memory access. > + virtual bool pair_mem_in_range_p (HOST_WIDE_INT offset) = 0; > + > + // Given a load/store pair insn in PATTERN, unpack the insn, storing > + // the register operands in REGS, and returning the mem. LOAD_P is > + // true for loads and false for stores. > + virtual rtx destructure_pair (rtx regs[2], rtx pattern, bool load_p) = 0; > + > + // Given a pair mem in MEM, register operands in REGS, and an rtx > + // representing the effect of writeback on the base register in WB_EFFECT, > + // return an insn representing a writeback variant of this pair. > + // LOAD_P is true iff the pair is a load. > + // This is used when promoting existing non-writeback pairs to writeback > + // variants. > + virtual rtx gen_promote_writeback_pair (rtx wb_effect, rtx mem, > + rtx regs[2], bool load_p) = 0; > + > + inline void process_block (rtl_ssa::bb_info *bb); > + inline rtl_ssa::insn_info *find_trailing_add (rtl_ssa::insn_info *insns[2], > + const rtl_ssa::insn_range_info > &pair_range, Nit: long line here. You could either add a line break after the return type of the function or immediately before &pair_range. I'd guess the latter is preferred, but I'm not too sure in this case. > + int initial_writeback, > + rtx *writeback_effect, > + rtl_ssa::def_info **add_def, > + rtl_ssa::def_info *base_def, > + poly_int64 initial_offset, > + unsigned access_size); > + inline int get_viable_bases (rtl_ssa::insn_info *insns[2], > + vec<base_cand> &base_cands, > + rtx cand_mems[2], > + unsigned access_size, > + bool reversed); > + inline void do_alias_analysis (rtl_ssa::insn_info *alias_hazards[4], > + alias_walker *walkers[4], > + bool load_p); > + inline void try_promote_writeback (rtl_ssa::insn_info *insn, bool load_p); > + void run (); > + ~pair_fusion (); > +}; > -- > 2.39.3 > Thanks, Alex