From: Julian Brown <jul...@codesourcery.com> This patch was originally written by Julian in 2021 for the OG10 branch, but does not appear to have been proposed for upstream at that time, or since. I've now forward ported it and retested it. Thomas reported test regressions with this patch on the OG14 branch, but I think it was exposing some bugs in the backend; I can't reproduce those failures on mainline.
I'm not sure what the original motivating test case was, but I see that the gfortran.dg/vect/fast-math-pr37021.f90 testcase is reduced from ~24k lines of assembler down to <7k, on amdgcn. OK for mainline? Andrew ------------ For AMD GCN, the instructions available for loading/storing vectors are always scatter/gather operations (i.e. there are separate addresses for each vector lane), so the current heuristic to avoid gather/scatter operations with too many elements in get_group_load_store_type is counterproductive. Avoiding such operations in that function can subsequently lead to a missed vectorization opportunity whereby later analyses in the vectorizer try to use a very wide array type which is not available on this target, and thus it bails out. This patch adds a target hook to override the "single_element_p" heuristic in the function as a target hook, and activates it for GCN. This allows much better code to be generated for affected loops. Co-authored-by: Julian Brown <jul...@codesourcery.com> gcc/ * doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add documentation hook. * doc/tm.texi: Regenerate. * target.def (prefer_gather_scatter): Add target hook under vectorizer. * tree-vect-stmts.cc (get_group_load_store_type): Optionally prefer gather/scatter instructions to scalar/elementwise fallback. * config/gcn/gcn.cc (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define hook. --- gcc/config/gcn/gcn.cc | 2 ++ gcc/doc/tm.texi | 5 +++++ gcc/doc/tm.texi.in | 2 ++ gcc/target.def | 8 ++++++++ gcc/tree-vect-stmts.cc | 2 +- 5 files changed, 18 insertions(+), 1 deletion(-) diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index 3b26d5c6a58..d451bf43355 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -7998,6 +7998,8 @@ gcn_dwarf_register_span (rtx rtl) gcn_vector_alignment_reachable #undef TARGET_VECTOR_MODE_SUPPORTED_P #define TARGET_VECTOR_MODE_SUPPORTED_P gcn_vector_mode_supported_p +#undef TARGET_VECTORIZE_PREFER_GATHER_SCATTER +#define TARGET_VECTORIZE_PREFER_GATHER_SCATTER true #undef TARGET_DOCUMENTATION_NAME #define TARGET_DOCUMENTATION_NAME "AMD GCN" diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 5e305643b3a..29177d81466 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6511,6 +6511,11 @@ The default is @code{NULL_TREE} which means to not vectorize scatter stores. @end deftypefn +@deftypevr {Target Hook} bool TARGET_VECTORIZE_PREFER_GATHER_SCATTER +This hook is set to TRUE if gather loads or scatter stores are cheaper on +this target than a sequence of elementwise loads or stores. +@end deftypevr + @deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int}, @var{bool}) This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float} fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index eccc4d88493..b03ad4c97c6 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4311,6 +4311,8 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_VECTORIZE_BUILTIN_SCATTER +@hook TARGET_VECTORIZE_PREFER_GATHER_SCATTER + @hook TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN @hook TARGET_SIMD_CLONE_ADJUST diff --git a/gcc/target.def b/gcc/target.def index 38903eb567a..dd57b7072af 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2056,6 +2056,14 @@ all zeros. GCC can then try to branch around the instruction instead.", (unsigned ifn), default_empty_mask_is_expensive) +/* Prefer gather/scatter loads/stores to e.g. elementwise accesses if\n\ +we cannot use a contiguous access. */ +DEFHOOKPOD +(prefer_gather_scatter, + "This hook is set to TRUE if gather loads or scatter stores are cheaper on\n\ +this target than a sequence of elementwise loads or stores.", + bool, false) + /* Target builtin that implements vector gather operation. */ DEFHOOK (builtin_gather, diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 2e9b3d2e686..8ca33f5951a 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -2349,7 +2349,7 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, allows us to use contiguous accesses. */ if ((*memory_access_type == VMAT_ELEMENTWISE || *memory_access_type == VMAT_STRIDED_SLP) - && single_element_p + && (targetm.vectorize.prefer_gather_scatter || single_element_p) && SLP_TREE_LANES (slp_node) == 1 && loop_vinfo && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo, -- 2.50.0