> -----Original Message-----
> From: Richard Biener <rguent...@suse.de>
> Sent: Monday, November 11, 2024 10:13 AM
> To: gcc-patches@gcc.gnu.org
> Cc: RISC-V CI <patchworks...@rivosinc.com>; Tamar Christina
> <tamar.christ...@arm.com>
> Subject: [PATCH][v2] tree-optimization/117502 - VMAT_STRIDED_SLP vs
> VMAT_ELEMENTWISE when considering gather
> 
> The following treats both the same when considering to use gather or
> scatter for single-element interleaving accesses.
> 
> This will cause
> 
> FAIL: gcc.target/aarch64/sve/sve_iters_low_2.c scan-tree-dump-not vect
> "LOOP VECTORIZED"
> 
> where we now vectorize the loop with VNx4QI, I'll leave it to ARM folks
> to investigate whether that's OK and to adjust the testcase or to see
> where to adjust things to make the testcase not vectorized again.  The
> original fix for which the testcase was introduced is still efffective.

No, it shouldn't vectorize. It looks like this is a vectorizer bug,  with this
change we no longer see load costing for vect_body.
As In, the vectorizer never asks for backend costing of the loads inside the
vector body.

There are some curious things, like the costing done by vect_get_load_cost 
doesn't seem
to ever get slp_node so the cost structure used by vect_slp_analyze_operations
always has cost->node NULL.  The same is true for almost all the 
record_stmt_cost
calls in vectorizable_load.  So it's unclear to me if this behavior is expected 
or a bug.

Updating vect_get_load_cost to save the SLP node still doesn't get it though.

Also looking at the costing code, are these not wrong?:

                  if (costing_p)
                    {
                      unsigned int cnunits = vect_nunits_for_cost (vectype);
                      inside_cost
                        = record_stmt_cost (cost_vec, cnunits, scalar_load,
                                            stmt_info, 0, vect_body);
                      continue;
                    }

As if the VMAT is on the slp_node the costing needs to get it:

                  if (costing_p)
                    {
                      unsigned int cnunits = vect_nunits_for_cost (vectype);
                      inside_cost
                        = record_stmt_cost (cost_vec, cnunits, scalar_load,
                                            stmt_info, slp_node, 0, vect_body);
                      continue;
                    }

And a few others?

But I haven't yet figured out why the vectorizer no longer asks to costs the 
loads
and stores in the vector loop.

If it's a bug I'll keep debugging, if it's not, how should the backend cost 
these?

Thanks,
Tamar
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
>       PR tree-optimization/117502
>       * tree-vect-stmts.cc (get_group_load_store_type): Also consider
>       VMAT_STRIDED_SLP when checking to use gather/scatter for
>       single-element interleaving access.
>       * tree-vect-loop.cc (update_epilogue_loop_vinfo):
> STMT_VINFO_STRIDED_P
>       can be classified as VMAT_GATHER_SCATTER, so update DR_REF for
>       those as well.
> ---
>  gcc/tree-vect-loop.cc  | 1 +
>  gcc/tree-vect-stmts.cc | 3 ++-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 6cfce5aa7e1..f50ee2e958e 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -12295,6 +12295,7 @@ update_epilogue_loop_vinfo (class loop *epilogue,
> tree advance)
>        refs that get_load_store_type classified as VMAT_GATHER_SCATTER.  */
>        auto vstmt_vinfo = vect_stmt_to_vectorize (stmt_vinfo);
>        if (STMT_VINFO_MEMORY_ACCESS_TYPE (vstmt_vinfo) ==
> VMAT_GATHER_SCATTER
> +       || STMT_VINFO_STRIDED_P (vstmt_vinfo)
>         || STMT_VINFO_GATHER_SCATTER_P (vstmt_vinfo))
>       {
>         /* ???  As we copy epilogues from the main loop incremental
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index fa44e19f163..b09c016949f 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -2274,7 +2274,8 @@ get_group_load_store_type (vec_info *vinfo,
> stmt_vec_info stmt_info,
>       on nearby locations.  Or, even if it's a win over scalar code,
>       it might not be a win over vectorizing at a lower VF, if that
>       allows us to use contiguous accesses.  */
> -  if (*memory_access_type == VMAT_ELEMENTWISE
> +  if ((*memory_access_type == VMAT_ELEMENTWISE
> +       || *memory_access_type == VMAT_STRIDED_SLP)
>        && single_element_p
>        && loop_vinfo
>        && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo,
> --
> 2.43.0

Reply via email to