https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119181
--- Comment #8 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #7)
> The issue is we detect this as a single interleaving group:
>
> t.c:12:1: note: Detected interleaving load of size 264
> t.c:12:1: note: _1 = *a_26(D);
> t.c:12:1: note: _5 = MEM[(double *)a_26(D) + 8B];
> t.c:12:1: note: _7 = MEM[(double *)a_26(D) + 16B];
> t.c:12:1: note: _11 = MEM[(double *)a_26(D) + 24B];
> t.c:12:1: note: _14 = MEM[(double *)a_26(D) + 32B];
> t.c:12:1: note: _17 = MEM[(double *)a_26(D) + 40B];
> t.c:12:1: note: _19 = MEM[(double *)a_26(D) + 48B];
> t.c:12:1: note: _22 = MEM[(double *)a_26(D) + 56B];
> t.c:12:1: note: <gap of 248 elements>
> t.c:12:1: note: _2 = MEM[(double *)a_26(D) + 2048B];
> t.c:12:1: note: _4 = MEM[(double *)a_26(D) + 2056B];
> t.c:12:1: note: _8 = MEM[(double *)a_26(D) + 2064B];
> t.c:12:1: note: _10 = MEM[(double *)a_26(D) + 2072B];
> t.c:12:1: note: _13 = MEM[(double *)a_26(D) + 2080B];
> t.c:12:1: note: _16 = MEM[(double *)a_26(D) + 2088B];
> t.c:12:1: note: _20 = MEM[(double *)a_26(D) + 2096B];
> t.c:12:1: note: _23 = MEM[(double *)a_26(D) + 2104B];
>
> so the heuristic to swap operands to get a single group in leafs doesn't
> work. Instead you get offsetting costs to avoid runaway with very large
> gaps:
Thanks for pointing this.
>
> *a_26(D) 132 times unaligned_load (misalign -1) costs 1584 in body
>
> and that makes it unprofitable.
>
> There is indeed some better heuristic needed where to split groups - gaps
> bigger than the biggest vector size might be a good candidate. Note
> when two different interleaving groups are used in the same SLP leaf
> we fail as we don't support that yet.
A simple hack like below works, But I guess we may need better heuristic.
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index c9395e33fcd..d9d55ff4a3e 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -3567,6 +3567,12 @@ vect_analyze_data_ref_accesses (vec_info *vinfo,
&& init_a <= init_prev
&& init_prev <= init_b);
+ tree vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (DR_REF
(dra)));
+ unsigned HOST_WIDE_INT vf;
+ if (vectype
+ && TYPE_VECTOR_SUBPARTS (vectype).is_constant (&vf)
+ && (unsigned HOST_WIDE_INT)(init_b - init_a) > vf * tree_to_uhwi
(sza))
+ break;
/* Do not place the same access in the interleaving chain twice. */
if (init_b == init_prev)
{