on 2023/8/14 22:16, Richard Sandiford wrote:
> "Kewen.Lin" <li...@linux.ibm.com> writes:
>> Hi Richard,
>>
>> on 2023/8/14 20:20, Richard Sandiford wrote:
>>> Thanks for the clean-ups.  But...
>>>
>>> "Kewen.Lin" <li...@linux.ibm.com> writes:
>>>> Hi,
>>>>
>>>> Following Richi's suggestion [1], this patch is to move the
>>>> handlings on VMAT_GATHER_SCATTER in the final loop nest
>>>> of function vectorizable_load to its own loop.  Basically
>>>> it duplicates the final loop nest, clean up some useless
>>>> set up code for the case of VMAT_GATHER_SCATTER, remove some
>>>> unreachable code.  Also remove the corresponding handlings
>>>> in the final loop nest.
>>>>
>>>> Bootstrapped and regtested on x86_64-redhat-linux,
>>>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>>>>
>>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623329.html
>>>>
>>>> Is it ok for trunk?
>>>>
>>>> BR,
>>>> Kewen
>>>> -----
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>    * tree-vect-stmts.cc (vectorizable_load): Move the handlings on
>>>>    VMAT_GATHER_SCATTER in the final loop nest to its own loop,
>>>>    and update the final nest accordingly.
>>>> ---
>>>>  gcc/tree-vect-stmts.cc | 361 +++++++++++++++++++++++++----------------
>>>>  1 file changed, 219 insertions(+), 142 deletions(-)
>>>
>>> ...that seems like quite a lot of +s.  Is there nothing we can do to
>>> avoid the cut-&-paste?
>>
>> Thanks for the comments!  I'm not sure if I get your question, if we
>> want to move out the handlings of VMAT_GATHER_SCATTER, the new +s seem
>> inevitable?  Your concern is mainly about git blame history?
> 
> No, it was more that 219-142=77, so it seems like a lot of lines
> are being duplicated rather than simply being moved.  (Unlike for
> VMAT_LOAD_STORE_LANES, which was even a slight LOC saving, and so
> was a clear improvement.)
> 
> So I was just wondering if there was any obvious factoring-out that
> could be done to reduce the duplication.

ah, thanks for the clarification!

I think the main duplication are on the loop body beginning and end,
let's take a look at them in details:

+  if (memory_access_type == VMAT_GATHER_SCATTER)
+    {
+      gcc_assert (alignment_support_scheme == dr_aligned
+                 || alignment_support_scheme == dr_unaligned_supported);
+      gcc_assert (!grouped_load && !slp_perm);
+
+      unsigned int inside_cost = 0, prologue_cost = 0;

// These above are newly added.

+      for (j = 0; j < ncopies; j++)
+       {
+         /* 1. Create the vector or array pointer update chain.  */
+         if (j == 0 && !costing_p)
+           {
+             if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
+               vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info,
+                                            slp_node, &gs_info, &dataref_ptr,
+                                            &vec_offsets);
+             else
+               dataref_ptr
+                 = vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_type,
+                                             at_loop, offset, &dummy, gsi,
+                                             &ptr_incr, false, bump);
+           }
+         else if (!costing_p)
+           {
+             gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo));
+             if (!STMT_VINFO_GATHER_SCATTER_P (stmt_info))
+               dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr,
+                                              gsi, stmt_info, bump);
+           }

// These are for dataref_ptr, in the final looop nest we deal with more cases
on simd_lane_access_p and diff_first_stmt_info, but don't handle
STMT_VINFO_GATHER_SCATTER_P any more, very few (one case) can be shared between,
IMHO factoring out it seems like a overkill.

+
+         if (mask && !costing_p)
+           vec_mask = vec_masks[j];

// It's merged out from j == 0 and j != 0

+
+         gimple *new_stmt = NULL;
+         for (i = 0; i < vec_num; i++)
+           {
+             tree final_mask = NULL_TREE;
+             tree final_len = NULL_TREE;
+             tree bias = NULL_TREE;
+             if (!costing_p)
+               {
+                 if (loop_masks)
+                   final_mask
+                     = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
+                                           vec_num * ncopies, vectype,
+                                           vec_num * j + i);
+                 if (vec_mask)
+                   final_mask = prepare_vec_mask (loop_vinfo, mask_vectype,
+                                                  final_mask, vec_mask, gsi);
+
+                 if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info))
+                   dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr,
+                                                  gsi, stmt_info, bump);
+               }

// This part is directly copied from the original, the original gets updated by
removing && !STMT_VINFO_GATHER_SCATTER_P.  Due to its size, I didn't consider
this before, do you prefer me to factor this part out?

+             if (gs_info.ifn != IFN_LAST)
+               {
...
+               }
+             else
+               {
+                 /* Emulated gather-scatter.  */
...

// This part is just moved from the original.

+             vec_dest = vect_create_destination_var (scalar_dest, vectype);
+             /* DATA_REF is null if we've already built the statement.  */
+             if (data_ref)
+               {
+                 vect_copy_ref_info (data_ref, DR_REF (first_dr_info->dr));
+                 new_stmt = gimple_build_assign (vec_dest, data_ref);
+               }
+             new_temp = make_ssa_name (vec_dest, new_stmt);
+             gimple_set_lhs (new_stmt, new_temp);
+             vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
+
+             /* Store vector loads in the corresponding SLP_NODE.  */
+             if (slp)
+               slp_node->push_vec_def (new_stmt);
+
+         if (!slp && !costing_p)
+           STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
+       }
+
+      if (!slp && !costing_p)
+       *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];

// This part is some subsequent handlings, it's duplicated from the original
but removing some more useless code.  I guess this part is not worthy
being factored out?

+      if (costing_p)
+       {
+         if (dump_enabled_p ())
+           dump_printf_loc (MSG_NOTE, vect_location,
+                            "vect_model_load_cost: inside_cost = %u, "
+                            "prologue_cost = %u .\n",
+                            inside_cost, prologue_cost);
+       }
+      return true;
+    }

// Duplicating the dumping, I guess it's unnecessary to be factored out.

oh, I just noticed that this should be shorten as
"if (costing_p && dump_enabled_p ())" instead, just the same as what's
adopted for VMAT_LOAD_STORE_LANES dumping.

BR,
Kewen

Reply via email to