> -----Original Message----- > From: Richard Biener <rguent...@suse.de> > Sent: Monday, November 11, 2024 10:13 AM > To: gcc-patches@gcc.gnu.org > Cc: RISC-V CI <patchworks...@rivosinc.com>; Tamar Christina > <tamar.christ...@arm.com> > Subject: [PATCH][v2] tree-optimization/117502 - VMAT_STRIDED_SLP vs > VMAT_ELEMENTWISE when considering gather > > The following treats both the same when considering to use gather or > scatter for single-element interleaving accesses. > > This will cause > > FAIL: gcc.target/aarch64/sve/sve_iters_low_2.c scan-tree-dump-not vect > "LOOP VECTORIZED" > > where we now vectorize the loop with VNx4QI, I'll leave it to ARM folks > to investigate whether that's OK and to adjust the testcase or to see > where to adjust things to make the testcase not vectorized again. The > original fix for which the testcase was introduced is still efffective.
No, it shouldn't vectorize. It looks like this is a vectorizer bug, with this change we no longer see load costing for vect_body. As In, the vectorizer never asks for backend costing of the loads inside the vector body. There are some curious things, like the costing done by vect_get_load_cost doesn't seem to ever get slp_node so the cost structure used by vect_slp_analyze_operations always has cost->node NULL. The same is true for almost all the record_stmt_cost calls in vectorizable_load. So it's unclear to me if this behavior is expected or a bug. Updating vect_get_load_cost to save the SLP node still doesn't get it though. Also looking at the costing code, are these not wrong?: if (costing_p) { unsigned int cnunits = vect_nunits_for_cost (vectype); inside_cost = record_stmt_cost (cost_vec, cnunits, scalar_load, stmt_info, 0, vect_body); continue; } As if the VMAT is on the slp_node the costing needs to get it: if (costing_p) { unsigned int cnunits = vect_nunits_for_cost (vectype); inside_cost = record_stmt_cost (cost_vec, cnunits, scalar_load, stmt_info, slp_node, 0, vect_body); continue; } And a few others? But I haven't yet figured out why the vectorizer no longer asks to costs the loads and stores in the vector loop. If it's a bug I'll keep debugging, if it's not, how should the backend cost these? Thanks, Tamar > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > PR tree-optimization/117502 > * tree-vect-stmts.cc (get_group_load_store_type): Also consider > VMAT_STRIDED_SLP when checking to use gather/scatter for > single-element interleaving access. > * tree-vect-loop.cc (update_epilogue_loop_vinfo): > STMT_VINFO_STRIDED_P > can be classified as VMAT_GATHER_SCATTER, so update DR_REF for > those as well. > --- > gcc/tree-vect-loop.cc | 1 + > gcc/tree-vect-stmts.cc | 3 ++- > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index 6cfce5aa7e1..f50ee2e958e 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -12295,6 +12295,7 @@ update_epilogue_loop_vinfo (class loop *epilogue, > tree advance) > refs that get_load_store_type classified as VMAT_GATHER_SCATTER. */ > auto vstmt_vinfo = vect_stmt_to_vectorize (stmt_vinfo); > if (STMT_VINFO_MEMORY_ACCESS_TYPE (vstmt_vinfo) == > VMAT_GATHER_SCATTER > + || STMT_VINFO_STRIDED_P (vstmt_vinfo) > || STMT_VINFO_GATHER_SCATTER_P (vstmt_vinfo)) > { > /* ??? As we copy epilogues from the main loop incremental > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index fa44e19f163..b09c016949f 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -2274,7 +2274,8 @@ get_group_load_store_type (vec_info *vinfo, > stmt_vec_info stmt_info, > on nearby locations. Or, even if it's a win over scalar code, > it might not be a win over vectorizing at a lower VF, if that > allows us to use contiguous accesses. */ > - if (*memory_access_type == VMAT_ELEMENTWISE > + if ((*memory_access_type == VMAT_ELEMENTWISE > + || *memory_access_type == VMAT_STRIDED_SLP) > && single_element_p > && loop_vinfo > && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo, > -- > 2.43.0