From: Julian Brown <jul...@codesourcery.com>

This patch was originally written by Julian in 2021 for the OG10 branch,
but does not appear to have been proposed for upstream at that time, or
since.  I've now forward ported it and retested it.  Thomas reported
test regressions with this patch on the OG14 branch, but I think it was
exposing some bugs in the backend; I can't reproduce those failures on
mainline.

I'm not sure what the original motivating test case was, but I see that
the gfortran.dg/vect/fast-math-pr37021.f90 testcase is reduced from ~24k
lines of assembler down to <7k, on amdgcn.

OK for mainline?

Andrew

------------

For AMD GCN, the instructions available for loading/storing vectors are
always scatter/gather operations (i.e. there are separate addresses for
each vector lane), so the current heuristic to avoid gather/scatter
operations with too many elements in get_group_load_store_type is
counterproductive. Avoiding such operations in that function can
subsequently lead to a missed vectorization opportunity whereby later
analyses in the vectorizer try to use a very wide array type which is
not available on this target, and thus it bails out.

This patch adds a target hook to override the "single_element_p"
heuristic in the function as a target hook, and activates it for GCN. This
allows much better code to be generated for affected loops.

Co-authored-by:  Julian Brown  <jul...@codesourcery.com>

gcc/
        * doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add
        documentation hook.
        * doc/tm.texi: Regenerate.
        * target.def (prefer_gather_scatter): Add target hook under vectorizer.
        * tree-vect-stmts.cc (get_group_load_store_type): Optionally prefer
        gather/scatter instructions to scalar/elementwise fallback.
        * config/gcn/gcn.cc (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define
        hook.
---
 gcc/config/gcn/gcn.cc  | 2 ++
 gcc/doc/tm.texi        | 5 +++++
 gcc/doc/tm.texi.in     | 2 ++
 gcc/target.def         | 8 ++++++++
 gcc/tree-vect-stmts.cc | 2 +-
 5 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 3b26d5c6a58..d451bf43355 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -7998,6 +7998,8 @@ gcn_dwarf_register_span (rtx rtl)
   gcn_vector_alignment_reachable
 #undef  TARGET_VECTOR_MODE_SUPPORTED_P
 #define TARGET_VECTOR_MODE_SUPPORTED_P gcn_vector_mode_supported_p
+#undef  TARGET_VECTORIZE_PREFER_GATHER_SCATTER
+#define TARGET_VECTORIZE_PREFER_GATHER_SCATTER true
 
 #undef TARGET_DOCUMENTATION_NAME
 #define TARGET_DOCUMENTATION_NAME "AMD GCN"
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 5e305643b3a..29177d81466 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6511,6 +6511,11 @@ The default is @code{NULL_TREE} which means to not 
vectorize scatter
 stores.
 @end deftypefn
 
+@deftypevr {Target Hook} bool TARGET_VECTORIZE_PREFER_GATHER_SCATTER
+This hook is set to TRUE if gather loads or scatter stores are cheaper on
+this target than a sequence of elementwise loads or stores.
+@end deftypevr
+
 @deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN 
(struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, 
@var{int}, @var{bool})
 This hook should set @var{vecsize_mangle}, @var{vecsize_int}, 
@var{vecsize_float}
 fields in @var{simd_clone} structure pointed by @var{clone_info} argument and 
also
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index eccc4d88493..b03ad4c97c6 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4311,6 +4311,8 @@ address;  but often a machine-dependent strategy can 
generate better code.
 
 @hook TARGET_VECTORIZE_BUILTIN_SCATTER
 
+@hook TARGET_VECTORIZE_PREFER_GATHER_SCATTER
+
 @hook TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN
 
 @hook TARGET_SIMD_CLONE_ADJUST
diff --git a/gcc/target.def b/gcc/target.def
index 38903eb567a..dd57b7072af 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2056,6 +2056,14 @@ all zeros.  GCC can then try to branch around the 
instruction instead.",
  (unsigned ifn),
  default_empty_mask_is_expensive)
 
+/* Prefer gather/scatter loads/stores to e.g. elementwise accesses if\n\
+we cannot use a contiguous access.  */
+DEFHOOKPOD
+(prefer_gather_scatter,
+ "This hook is set to TRUE if gather loads or scatter stores are cheaper on\n\
+this target than a sequence of elementwise loads or stores.",
+ bool, false)
+
 /* Target builtin that implements vector gather operation.  */
 DEFHOOK
 (builtin_gather,
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 2e9b3d2e686..8ca33f5951a 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2349,7 +2349,7 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info 
stmt_info,
      allows us to use contiguous accesses.  */
   if ((*memory_access_type == VMAT_ELEMENTWISE
        || *memory_access_type == VMAT_STRIDED_SLP)
-      && single_element_p
+      && (targetm.vectorize.prefer_gather_scatter || single_element_p)
       && SLP_TREE_LANES (slp_node) == 1
       && loop_vinfo
       && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo,
-- 
2.50.0

Reply via email to