Re: [Patch, aarch64] v6: Preparatory patch to place target independent and, dependent changed code in one file

Richard Sandiford Fri, 17 May 2024 10:37:39 -0700

Ajit Agarwal <aagar...@linux.ibm.com> writes:
> Hello Alex/Richard:
>
> All review comments are addressed.
>
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
>
> Target independent code is the Generic code with pure virtual function
> to interface between target independent and dependent code.
>
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
>
> Bootstrapped and regtested on aarch64-linux-gnu.
>
> Thanks & Regards
> Ajit


Thanks for the patch and thanks to Alex for the reviews.  The patch
looks good to me apart from the minor nits below and the comments that
Alex had.  Please post the updated patch for a final ok though.

> aarch64: Preparatory patch to place target independent and
> dependent changed code in one file
>
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
>
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
>
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
>
> 2024-05-15  Ajit Kumar Agarwal  <aagar...@linux.ibm.com>
>
> gcc/ChangeLog:
>
>       * config/aarch64/aarch64-ldp-fusion.cc: Place target
>       independent and dependent changed code.

Not sure this is a complete sentence.  Maybe:

        * config/aarch64/aarch64-ldp-fusion.cc: Factor out a
        target-independent interface and move it to the head of the file.

That technically isn't detailed enough for a changelog entry,
but IMO we should use it anyway.  It's pointless to write the usual
amount of detail when the code is going to move soon.

> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 533 +++++++++++++++--------
>  1 file changed, 357 insertions(+), 176 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 1d9caeab05d..429e532ea3b 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -138,6 +138,225 @@ struct alt_base
>    poly_int64 offset;
>  };
>  
> +// Virtual base class for load/store walkers used in alias analysis.
> +struct alias_walker
> +{
> +  virtual bool conflict_p (int &budget) const = 0;
> +  virtual insn_info *insn () const = 0;
> +  virtual bool valid () const = 0;
> +  virtual void advance () = 0;
> +};
> +
> +// When querying handle_writeback_opportunities, this enum is used to
> +// qualify which opportunities we are asking about.
> +enum class writeback {
> +  // Only those writeback opportunities that arise from existing
> +  // auto-increment accesses.
> +  EXISTING,
> +  // All writeback opportunities including those that involve folding

There should be a comma after "opportunities"

> +  // base register updates into a non-writeback pair.
> +  ALL
> +};
> +
> +struct pair_fusion {
> +  pair_fusion ()
> +  {
> +    calculate_dominance_info (CDI_DOMINATORS);
> +    df_analyze ();
> +    crtl->ssa = new rtl_ssa::function_info (cfun);
> +  };

Unnecessary trailing ";".  I think it'd be better to define this and
the destructor out-of-line though.  For one thing, it'll reduce the number
of header file dependencies, once the code is moved to its own header file.

> +
> +  // Given:
> +  // - an rtx REG_OP, the non-memory operand in a load/store insn,
> +  // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and
> +  // - a boolean LOAD_P (true iff the insn is a load), then:
> +  // return true if the access should be considered an FP/SIMD access.
> +  // Such accesses are segregated from GPR accesses, since we only want
> +  // to form pairs for accesses that use the same register file.
> +  virtual bool fpsimd_op_p (rtx, machine_mode, bool)
> +  {
> +    return false;
> +  }
> +
> +  // Return true if we should consider forming pairs from memory
> +  // accesses with operand mode MODE at this stage in compilation.
> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
> +
> +  // Return true iff REG_OP is a suitable register operand for a paired
> +  // memory access, where LOAD_P is true if we're asking about loads and
> +  // false for stores.  MODE gives the mode of the operand.
> +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
> +                                   machine_mode mode) = 0;
> +
> +  // Return alias check limit.
> +  // This is needed to avoid unbounded quadratic behaviour when
> +  // performing alias analysis.
> +  virtual int pair_mem_alias_check_limit () = 0;

I think the end result should be to make this a target-independent
--param, but this is ok/good as an intermediate step.

> +
> +  // Returns true if we should try to handle writeback opportunities.

s/Returns/Return/

> +  // WHICH determines the kinds of writeback opportunities the caller
> +  // is asking about.
> +  virtual bool handle_writeback_opportunities (enum writeback which) = 0 ;

Excess space before the final ";".

How about calling the function "should_handle_writeback" instead?  I think
the current name implies that the function does the handling, whereas it
instead checks whether handling should be done.

The comment above enum class writeback would need to change too.

> +
> +  // Given BASE_MEM, the mem from the lower candidate access for a pair,
> +  // and LOAD_P (true if the access is a load), check if we should proceed
> +  // to form the pair given the target's code generation policy on
> +  // paired accesses.
> +  virtual bool pair_mem_ok_with_policy (rtx base_mem, bool load_p) = 0;
> +
> +  // Generate the pattern for a paired access.  PATS gives the patterns
> +  // for the individual memory accesses (which by this point must share a
> +  // common base register).  If WRITEBACK is non-NULL, then this rtx
> +  // describes the update to the base register that should be performed by
> +  // the resulting insn.  LOAD_P is true iff the accesses are loads.
> +  virtual rtx gen_pair (rtx *pats, rtx writeback, bool load_p) = 0;
> +
> +  // Return true if INSN is a paired memory access.  If so, set LOAD_P to
> +  // true iff INSN is a load pair.
> +  virtual bool pair_mem_insn_p (rtx_insn *insn, bool &load_p) = 0;
> +
> +  // Return true if we should track loads.
> +  virtual bool track_loads_p ()
> +  {
> +    return true;
> +  }
> +
> +  // Return true if we should track stores.
> +  virtual bool track_stores_p ()
> +  {
> +    return true;
> +  }
> +
> +  // Return true if OFFSET is in range for a paired memory access.
> +  virtual bool pair_mem_in_range_p (HOST_WIDE_INT offset) = 0;
> +
> +  // Given a load/store pair insn in PATTERN, unpack the insn, storing
> +  // the register operands in REGS, and returning the mem.  LOAD_P is
> +  // true for loads and false for stores.
> +  virtual rtx destructure_pair (rtx regs[2], rtx pattern, bool load_p) = 0;
> +
> +  // Given a pair mem in MEM, register operands in REGS, and an rtx
> +  // representing the effect of writeback on the base register in WB_EFFECT,
> +  // return an insn representing a writeback variant of this pair.
> +  // LOAD_P is true iff the pair is a load.
> +  // This is used when promoting existing non-writeback pairs to writeback
> +  // variants.
> +  virtual rtx gen_promote_writeback_pair (rtx wb_effect, rtx mem,
> +                                       rtx regs[2], bool load_p) = 0;
> +
> +  void ldp_fusion_bb (bb_info *bb);

This seems to be the only place in the interface that preserves the
old "ldp" name.  Given that this is a member function, and that the class
that it's in provides context, it might be better to rename the function
to something bland like:

  void process_block (bb_info *bb);

> [...]
> @@ -159,9 +378,9 @@ struct ldp_bb_info
>    hash_map<def_hash, alt_base> canon_base_map;
>  
>    static const size_t obstack_alignment = sizeof (void *);
> -  bb_info *m_bb;
>  
> -  ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
> +  ldp_bb_info (bb_info *bb, pair_fusion *d)
> +            : m_bb (bb), m_pass (d), m_emitted_tombstone (false)

Formatting nit: the usual style is to indent ": ..." by two spaces
relative to the function name (rather than line it up with the "("):

  ldp_bb_info (bb_info *bb, pair_fusion *d)
    : m_bb (bb), m_pass (d), m_emitted_tombstone (false)

> [...]
> @@ -510,11 +760,11 @@ ldp_bb_info::track_access (insn_info *insn, bool 
> load_p, rtx mem)
>    // constraint; this early return means we never get to the code
>    // that calls targetm.legitimize_address_displacement.
>    //
> -  // So for now, it's better to punt when we can't be sure that the
> -  // offset is in range for LDP/STP.  Out-of-range cases can then be
> -  // handled after RA by the out-of-range LDP/STP peepholes.  Eventually, it
> -  // would be nice to handle known out-of-range opportunities in the
> -  // pass itself (for stack accesses, this would be in the post-RA pass).
> +  // It's better to punt when we can't be sure that the

I'd keep the "So for now,", unless Alex thinks we shouldn't.

> +  // offset is in range for paired access.  On aarch64, Out-of-range cases

s/Out-of-range/out-of-range/

> +  // can then be handled after RA by the out-of-range LDP/STP peepholes.
> +  // Eventually, it would be nice to handle known out-of-range opportunities
> +  // in the pass itself (for stack accesses, this would be in the post-RA 
> pass).
>    if (!reload_completed
>        && (REGNO (base) == FRAME_POINTER_REGNUM
>         || REGNO (base) == ARG_POINTER_REGNUM))
> [...]
> @@ -1199,8 +1449,8 @@ extract_writebacks (bool load_p, rtx pats[2], int 
> changed)
>  // base register.  If there is one, we choose the first such update after
>  // PAIR_DST that is still in the same BB as our pair.  We return the new def 
> in
>  // *ADD_DEF and the resulting writeback effect in *WRITEBACK_EFFECT.
> -static insn_info *
> -find_trailing_add (insn_info *insns[2],
> +insn_info *
> +pair_fusion::find_trailing_add (insn_info *insns[2],
>                  const insn_range_info &pair_range,
>                  int initial_writeback,
>                  rtx *writeback_effect,

The parameters should be reindented to match the longer name.

Thanks again for doing this.

Richard

Re: [Patch, aarch64] v6: Preparatory patch to place target independent and, dependent changed code in one file

Reply via email to