Hello Richard:
On 17/05/24 11:07 pm, Richard Sandiford wrote:
> Ajit Agarwal <[email protected]> writes:
>> Hello Alex/Richard:
>>
>> All review comments are addressed.
>>
>> Common infrastructure of load store pair fusion is divided into target
>> independent and target dependent changed code.
>>
>> Target independent code is the Generic code with pure virtual function
>> to interface between target independent and dependent code.
>>
>> Target dependent code is the implementation of pure virtual function for
>> aarch64 target and the call to target independent code.
>>
>> Bootstrapped and regtested on aarch64-linux-gnu.
>>
>> Thanks & Regards
>> Ajit
>
> Thanks for the patch and thanks to Alex for the reviews. The patch
> looks good to me apart from the minor nits below and the comments that
> Alex had. Please post the updated patch for a final ok though.
>
>> aarch64: Preparatory patch to place target independent and
>> dependent changed code in one file
>>
>> Common infrastructure of load store pair fusion is divided into target
>> independent and target dependent changed code.
>>
>> Target independent code is the Generic code with pure virtual function
>> to interface betwwen target independent and dependent code.
>>
>> Target dependent code is the implementation of pure virtual function for
>> aarch64 target and the call to target independent code.
>>
>> 2024-05-15 Ajit Kumar Agarwal <[email protected]>
>>
>> gcc/ChangeLog:
>>
>> * config/aarch64/aarch64-ldp-fusion.cc: Place target
>> independent and dependent changed code.
>
> Not sure this is a complete sentence. Maybe:
>
> * config/aarch64/aarch64-ldp-fusion.cc: Factor out a
> target-independent interface and move it to the head of the file.
>
> That technically isn't detailed enough for a changelog entry,
> but IMO we should use it anyway. It's pointless to write the usual
> amount of detail when the code is going to move soon.
>
Addressed in v7 of the patch.
>> ---
>> gcc/config/aarch64/aarch64-ldp-fusion.cc | 533 +++++++++++++++--------
>> 1 file changed, 357 insertions(+), 176 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> index 1d9caeab05d..429e532ea3b 100644
>> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> @@ -138,6 +138,225 @@ struct alt_base
>> poly_int64 offset;
>> };
>>
>> +// Virtual base class for load/store walkers used in alias analysis.
>> +struct alias_walker
>> +{
>> + virtual bool conflict_p (int &budget) const = 0;
>> + virtual insn_info *insn () const = 0;
>> + virtual bool valid () const = 0;
>> + virtual void advance () = 0;
>> +};
>> +
>> +// When querying handle_writeback_opportunities, this enum is used to
>> +// qualify which opportunities we are asking about.
>> +enum class writeback {
>> + // Only those writeback opportunities that arise from existing
>> + // auto-increment accesses.
>> + EXISTING,
>> + // All writeback opportunities including those that involve folding
>
> There should be a comma after "opportunities"
>
>> + // base register updates into a non-writeback pair.
>> + ALL
>> +};
>> +
>> +struct pair_fusion {
>> + pair_fusion ()
>> + {
>> + calculate_dominance_info (CDI_DOMINATORS);
>> + df_analyze ();
>> + crtl->ssa = new rtl_ssa::function_info (cfun);
>> + };
>
> Unnecessary trailing ";". I think it'd be better to define this and
> the destructor out-of-line though. For one thing, it'll reduce the number
> of header file dependencies, once the code is moved to its own header file.
>
Addressed in v7 of the patch.
>> +
>> + // Given:
>> + // - an rtx REG_OP, the non-memory operand in a load/store insn,
>> + // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and
>> + // - a boolean LOAD_P (true iff the insn is a load), then:
>> + // return true if the access should be considered an FP/SIMD access.
>> + // Such accesses are segregated from GPR accesses, since we only want
>> + // to form pairs for accesses that use the same register file.
>> + virtual bool fpsimd_op_p (rtx, machine_mode, bool)
>> + {
>> + return false;
>> + }
>> +
>> + // Return true if we should consider forming pairs from memory
>> + // accesses with operand mode MODE at this stage in compilation.
>> + virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
>> +
>> + // Return true iff REG_OP is a suitable register operand for a paired
>> + // memory access, where LOAD_P is true if we're asking about loads and
>> + // false for stores. MODE gives the mode of the operand.
>> + virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
>> + machine_mode mode) = 0;
>> +
>> + // Return alias check limit.
>> + // This is needed to avoid unbounded quadratic behaviour when
>> + // performing alias analysis.
>> + virtual int pair_mem_alias_check_limit () = 0;
>
> I think the end result should be to make this a target-independent
> --param, but this is ok/good as an intermediate step.
>
>> +
>> + // Returns true if we should try to handle writeback opportunities.
>
> s/Returns/Return/
Addressed in v7 of the patch.
>
>> + // WHICH determines the kinds of writeback opportunities the caller
>> + // is asking about.
>> + virtual bool handle_writeback_opportunities (enum writeback which) = 0 ;
>
> Excess space before the final ";".
>
Addressed in v7 of the patch.
> How about calling the function "should_handle_writeback" instead? I think
> the current name implies that the function does the handling, whereas it
> instead checks whether handling should be done.
>
> The comment above enum class writeback would need to change too.
>
>> +
>> + // Given BASE_MEM, the mem from the lower candidate access for a pair,
>> + // and LOAD_P (true if the access is a load), check if we should proceed
>> + // to form the pair given the target's code generation policy on
>> + // paired accesses.
>> + virtual bool pair_mem_ok_with_policy (rtx base_mem, bool load_p) = 0;
>> +
>> + // Generate the pattern for a paired access. PATS gives the patterns
>> + // for the individual memory accesses (which by this point must share a
>> + // common base register). If WRITEBACK is non-NULL, then this rtx
>> + // describes the update to the base register that should be performed by
>> + // the resulting insn. LOAD_P is true iff the accesses are loads.
>> + virtual rtx gen_pair (rtx *pats, rtx writeback, bool load_p) = 0;
>> +
>> + // Return true if INSN is a paired memory access. If so, set LOAD_P to
>> + // true iff INSN is a load pair.
>> + virtual bool pair_mem_insn_p (rtx_insn *insn, bool &load_p) = 0;
>> +
>> + // Return true if we should track loads.
>> + virtual bool track_loads_p ()
>> + {
>> + return true;
>> + }
>> +
>> + // Return true if we should track stores.
>> + virtual bool track_stores_p ()
>> + {
>> + return true;
>> + }
>> +
>> + // Return true if OFFSET is in range for a paired memory access.
>> + virtual bool pair_mem_in_range_p (HOST_WIDE_INT offset) = 0;
>> +
>> + // Given a load/store pair insn in PATTERN, unpack the insn, storing
>> + // the register operands in REGS, and returning the mem. LOAD_P is
>> + // true for loads and false for stores.
>> + virtual rtx destructure_pair (rtx regs[2], rtx pattern, bool load_p) = 0;
>> +
>> + // Given a pair mem in MEM, register operands in REGS, and an rtx
>> + // representing the effect of writeback on the base register in WB_EFFECT,
>> + // return an insn representing a writeback variant of this pair.
>> + // LOAD_P is true iff the pair is a load.
>> + // This is used when promoting existing non-writeback pairs to writeback
>> + // variants.
>> + virtual rtx gen_promote_writeback_pair (rtx wb_effect, rtx mem,
>> + rtx regs[2], bool load_p) = 0;
>> +
>> + void ldp_fusion_bb (bb_info *bb);
>
> This seems to be the only place in the interface that preserves the
> old "ldp" name. Given that this is a member function, and that the class
> that it's in provides context, it might be better to rename the function
> to something bland like:
>
> void process_block (bb_info *bb);
>
Addressed in v7 of the patch.
>> [...]
>> @@ -159,9 +378,9 @@ struct ldp_bb_info
>> hash_map<def_hash, alt_base> canon_base_map;
>>
>> static const size_t obstack_alignment = sizeof (void *);
>> - bb_info *m_bb;
>>
>> - ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
>> + ldp_bb_info (bb_info *bb, pair_fusion *d)
>> + : m_bb (bb), m_pass (d), m_emitted_tombstone (false)
>
> Formatting nit: the usual style is to indent ": ..." by two spaces
> relative to the function name (rather than line it up with the "("):
>
> ldp_bb_info (bb_info *bb, pair_fusion *d)
> : m_bb (bb), m_pass (d), m_emitted_tombstone (false)
>
Addressed in v7 of the patch.
>> [...]
>> @@ -510,11 +760,11 @@ ldp_bb_info::track_access (insn_info *insn, bool
>> load_p, rtx mem)
>> // constraint; this early return means we never get to the code
>> // that calls targetm.legitimize_address_displacement.
>> //
>> - // So for now, it's better to punt when we can't be sure that the
>> - // offset is in range for LDP/STP. Out-of-range cases can then be
>> - // handled after RA by the out-of-range LDP/STP peepholes. Eventually, it
>> - // would be nice to handle known out-of-range opportunities in the
>> - // pass itself (for stack accesses, this would be in the post-RA pass).
>> + // It's better to punt when we can't be sure that the
>
> I'd keep the "So for now,", unless Alex thinks we shouldn't.
>
>> + // offset is in range for paired access. On aarch64, Out-of-range cases
>
> s/Out-of-range/out-of-range/
Addressed in v7 of the patch.
>
>> + // can then be handled after RA by the out-of-range LDP/STP peepholes.
>> + // Eventually, it would be nice to handle known out-of-range opportunities
>> + // in the pass itself (for stack accesses, this would be in the post-RA
>> pass).
>> if (!reload_completed
>> && (REGNO (base) == FRAME_POINTER_REGNUM
>> || REGNO (base) == ARG_POINTER_REGNUM))
>> [...]
>> @@ -1199,8 +1449,8 @@ extract_writebacks (bool load_p, rtx pats[2], int
>> changed)
>> // base register. If there is one, we choose the first such update after
>> // PAIR_DST that is still in the same BB as our pair. We return the new
>> def in
>> // *ADD_DEF and the resulting writeback effect in *WRITEBACK_EFFECT.
>> -static insn_info *
>> -find_trailing_add (insn_info *insns[2],
>> +insn_info *
>> +pair_fusion::find_trailing_add (insn_info *insns[2],
>> const insn_range_info &pair_range,
>> int initial_writeback,
>> rtx *writeback_effect,
>
> The parameters should be reindented to match the longer name.
>
Addressed in v7 of the patch.
> Thanks again for doing this.
>
> Richard
Thanks & Regards
Ajit