Re: [Patch, aarch64] v6: Preparatory patch to place target independent and,dependent changed code in one file

Ajit Agarwal Sat, 18 May 2024 02:00:49 -0700

Hello Richard:

On 17/05/24 11:07 pm, Richard Sandiford wrote:
> Ajit Agarwal <[email protected]> writes:
>> Hello Alex/Richard:
>>
>> All review comments are addressed.
>>
>> Common infrastructure of load store pair fusion is divided into target
>> independent and target dependent changed code.
>>
>> Target independent code is the Generic code with pure virtual function
>> to interface between target independent and dependent code.
>>
>> Target dependent code is the implementation of pure virtual function for
>> aarch64 target and the call to target independent code.
>>
>> Bootstrapped and regtested on aarch64-linux-gnu.
>>
>> Thanks & Regards
>> Ajit
> 
> Thanks for the patch and thanks to Alex for the reviews.  The patch
> looks good to me apart from the minor nits below and the comments that
> Alex had.  Please post the updated patch for a final ok though.
> 
>> aarch64: Preparatory patch to place target independent and
>> dependent changed code in one file
>>
>> Common infrastructure of load store pair fusion is divided into target
>> independent and target dependent changed code.
>>
>> Target independent code is the Generic code with pure virtual function
>> to interface betwwen target independent and dependent code.
>>
>> Target dependent code is the implementation of pure virtual function for
>> aarch64 target and the call to target independent code.
>>
>> 2024-05-15  Ajit Kumar Agarwal  <[email protected]>
>>
>> gcc/ChangeLog:
>>
>>      * config/aarch64/aarch64-ldp-fusion.cc: Place target
>>      independent and dependent changed code.
> 
> Not sure this is a complete sentence.  Maybe:
> 
>       * config/aarch64/aarch64-ldp-fusion.cc: Factor out a
>       target-independent interface and move it to the head of the file.
> 
> That technically isn't detailed enough for a changelog entry,
> but IMO we should use it anyway.  It's pointless to write the usual
> amount of detail when the code is going to move soon.
>


Addressed in v7 of the patch.
>> ---
>>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 533 +++++++++++++++--------
>>  1 file changed, 357 insertions(+), 176 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
>> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> index 1d9caeab05d..429e532ea3b 100644
>> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> @@ -138,6 +138,225 @@ struct alt_base
>>    poly_int64 offset;
>>  };
>>  
>> +// Virtual base class for load/store walkers used in alias analysis.
>> +struct alias_walker
>> +{
>> +  virtual bool conflict_p (int &budget) const = 0;
>> +  virtual insn_info *insn () const = 0;
>> +  virtual bool valid () const = 0;
>> +  virtual void advance () = 0;
>> +};
>> +
>> +// When querying handle_writeback_opportunities, this enum is used to
>> +// qualify which opportunities we are asking about.
>> +enum class writeback {
>> +  // Only those writeback opportunities that arise from existing
>> +  // auto-increment accesses.
>> +  EXISTING,
>> +  // All writeback opportunities including those that involve folding
> 
> There should be a comma after "opportunities"
> 
>> +  // base register updates into a non-writeback pair.
>> +  ALL
>> +};
>> +
>> +struct pair_fusion {
>> +  pair_fusion ()
>> +  {
>> +    calculate_dominance_info (CDI_DOMINATORS);
>> +    df_analyze ();
>> +    crtl->ssa = new rtl_ssa::function_info (cfun);
>> +  };
> 
> Unnecessary trailing ";".  I think it'd be better to define this and
> the destructor out-of-line though.  For one thing, it'll reduce the number
> of header file dependencies, once the code is moved to its own header file.
> 

Addressed in v7 of the patch.
>> +
>> +  // Given:
>> +  // - an rtx REG_OP, the non-memory operand in a load/store insn,
>> +  // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and
>> +  // - a boolean LOAD_P (true iff the insn is a load), then:
>> +  // return true if the access should be considered an FP/SIMD access.
>> +  // Such accesses are segregated from GPR accesses, since we only want
>> +  // to form pairs for accesses that use the same register file.
>> +  virtual bool fpsimd_op_p (rtx, machine_mode, bool)
>> +  {
>> +    return false;
>> +  }
>> +
>> +  // Return true if we should consider forming pairs from memory
>> +  // accesses with operand mode MODE at this stage in compilation.
>> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
>> +
>> +  // Return true iff REG_OP is a suitable register operand for a paired
>> +  // memory access, where LOAD_P is true if we're asking about loads and
>> +  // false for stores.  MODE gives the mode of the operand.
>> +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
>> +                                  machine_mode mode) = 0;
>> +
>> +  // Return alias check limit.
>> +  // This is needed to avoid unbounded quadratic behaviour when
>> +  // performing alias analysis.
>> +  virtual int pair_mem_alias_check_limit () = 0;
> 
> I think the end result should be to make this a target-independent
> --param, but this is ok/good as an intermediate step.
> 
>> +
>> +  // Returns true if we should try to handle writeback opportunities.
> 
> s/Returns/Return/

Addressed in v7 of the patch.
> 
>> +  // WHICH determines the kinds of writeback opportunities the caller
>> +  // is asking about.
>> +  virtual bool handle_writeback_opportunities (enum writeback which) = 0 ;
> 
> Excess space before the final ";".
> 

Addressed in v7 of the patch.
> How about calling the function "should_handle_writeback" instead?  I think
> the current name implies that the function does the handling, whereas it
> instead checks whether handling should be done.
> 
> The comment above enum class writeback would need to change too.
> 
>> +
>> +  // Given BASE_MEM, the mem from the lower candidate access for a pair,
>> +  // and LOAD_P (true if the access is a load), check if we should proceed
>> +  // to form the pair given the target's code generation policy on
>> +  // paired accesses.
>> +  virtual bool pair_mem_ok_with_policy (rtx base_mem, bool load_p) = 0;
>> +
>> +  // Generate the pattern for a paired access.  PATS gives the patterns
>> +  // for the individual memory accesses (which by this point must share a
>> +  // common base register).  If WRITEBACK is non-NULL, then this rtx
>> +  // describes the update to the base register that should be performed by
>> +  // the resulting insn.  LOAD_P is true iff the accesses are loads.
>> +  virtual rtx gen_pair (rtx *pats, rtx writeback, bool load_p) = 0;
>> +
>> +  // Return true if INSN is a paired memory access.  If so, set LOAD_P to
>> +  // true iff INSN is a load pair.
>> +  virtual bool pair_mem_insn_p (rtx_insn *insn, bool &load_p) = 0;
>> +
>> +  // Return true if we should track loads.
>> +  virtual bool track_loads_p ()
>> +  {
>> +    return true;
>> +  }
>> +
>> +  // Return true if we should track stores.
>> +  virtual bool track_stores_p ()
>> +  {
>> +    return true;
>> +  }
>> +
>> +  // Return true if OFFSET is in range for a paired memory access.
>> +  virtual bool pair_mem_in_range_p (HOST_WIDE_INT offset) = 0;
>> +
>> +  // Given a load/store pair insn in PATTERN, unpack the insn, storing
>> +  // the register operands in REGS, and returning the mem.  LOAD_P is
>> +  // true for loads and false for stores.
>> +  virtual rtx destructure_pair (rtx regs[2], rtx pattern, bool load_p) = 0;
>> +
>> +  // Given a pair mem in MEM, register operands in REGS, and an rtx
>> +  // representing the effect of writeback on the base register in WB_EFFECT,
>> +  // return an insn representing a writeback variant of this pair.
>> +  // LOAD_P is true iff the pair is a load.
>> +  // This is used when promoting existing non-writeback pairs to writeback
>> +  // variants.
>> +  virtual rtx gen_promote_writeback_pair (rtx wb_effect, rtx mem,
>> +                                      rtx regs[2], bool load_p) = 0;
>> +
>> +  void ldp_fusion_bb (bb_info *bb);
> 
> This seems to be the only place in the interface that preserves the
> old "ldp" name.  Given that this is a member function, and that the class
> that it's in provides context, it might be better to rename the function
> to something bland like:
> 
>   void process_block (bb_info *bb);
> 

Addressed in v7 of the patch.
>> [...]
>> @@ -159,9 +378,9 @@ struct ldp_bb_info
>>    hash_map<def_hash, alt_base> canon_base_map;
>>  
>>    static const size_t obstack_alignment = sizeof (void *);
>> -  bb_info *m_bb;
>>  
>> -  ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
>> +  ldp_bb_info (bb_info *bb, pair_fusion *d)
>> +           : m_bb (bb), m_pass (d), m_emitted_tombstone (false)
> 
> Formatting nit: the usual style is to indent ": ..." by two spaces
> relative to the function name (rather than line it up with the "("):
> 
>   ldp_bb_info (bb_info *bb, pair_fusion *d)
>     : m_bb (bb), m_pass (d), m_emitted_tombstone (false)
> 

Addressed in v7 of the patch.
>> [...]
>> @@ -510,11 +760,11 @@ ldp_bb_info::track_access (insn_info *insn, bool 
>> load_p, rtx mem)
>>    // constraint; this early return means we never get to the code
>>    // that calls targetm.legitimize_address_displacement.
>>    //
>> -  // So for now, it's better to punt when we can't be sure that the
>> -  // offset is in range for LDP/STP.  Out-of-range cases can then be
>> -  // handled after RA by the out-of-range LDP/STP peepholes.  Eventually, it
>> -  // would be nice to handle known out-of-range opportunities in the
>> -  // pass itself (for stack accesses, this would be in the post-RA pass).
>> +  // It's better to punt when we can't be sure that the
> 
> I'd keep the "So for now,", unless Alex thinks we shouldn't.
> 
>> +  // offset is in range for paired access.  On aarch64, Out-of-range cases
> 
> s/Out-of-range/out-of-range/

Addressed in v7 of the patch.
> 
>> +  // can then be handled after RA by the out-of-range LDP/STP peepholes.
>> +  // Eventually, it would be nice to handle known out-of-range opportunities
>> +  // in the pass itself (for stack accesses, this would be in the post-RA 
>> pass).
>>    if (!reload_completed
>>        && (REGNO (base) == FRAME_POINTER_REGNUM
>>        || REGNO (base) == ARG_POINTER_REGNUM))
>> [...]
>> @@ -1199,8 +1449,8 @@ extract_writebacks (bool load_p, rtx pats[2], int 
>> changed)
>>  // base register.  If there is one, we choose the first such update after
>>  // PAIR_DST that is still in the same BB as our pair.  We return the new 
>> def in
>>  // *ADD_DEF and the resulting writeback effect in *WRITEBACK_EFFECT.
>> -static insn_info *
>> -find_trailing_add (insn_info *insns[2],
>> +insn_info *
>> +pair_fusion::find_trailing_add (insn_info *insns[2],
>>                 const insn_range_info &pair_range,
>>                 int initial_writeback,
>>                 rtx *writeback_effect,
> 
> The parameters should be reindented to match the longer name.
> 

Addressed in v7 of the patch.
> Thanks again for doing this.
> 
> Richard
Thanks & Regards
Ajit

Re: [Patch, aarch64] v6: Preparatory patch to place target independent and,dependent changed code in one file

Reply via email to