https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122793

--- Comment #11 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
Created attachment 63045
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=63045&action=edit
candidate patch

(In reply to Tamar Christina from comment #10)
> (In reply to Jakub Jelinek from comment #7)
> > Seems it is the pack_p case,
> > Commenting out
> >       /* Check whether the input has twice as many lanes per vector.  */
> >       else if (children.length () == 1
> >                && known_eq (SLP_TREE_LANES (child) * nunits,
> >                             SLP_TREE_LANES (node) * op_nunits * 2))
> >         pack_p = true;
> > makes the #c5 testcase pass, while commenting out
> >       /* Check whether the output has N times as many lanes per vector.  */
> >       else if (constant_multiple_p (SLP_TREE_LANES (node) * op_nunits,
> >                                     SLP_TREE_LANES (child) * nunits,
> >                                     &this_unpack_factor)
> >                && (i == 0 || unpack_factor == this_unpack_factor))
> >         unpack_factor = this_unpack_factor;
> > instead doesn't fix it.
> 
> Yes, the change in r15-4114-g8157f3f2d211bf has a bug in that repeated_p is
> initialized to true, but after this change it's only set to false when
> !pack_p && !widen.
> 
> so repeated_p stays true even when the vector isn't repeating, but
> repeated_p takes precedence over pack_p.
> 
> As a result it thinks that this pack operation replicates as a sequence from
> the wider vector:
> 
> (rr) p debug (node)
> perm.c:18:1: note: node 0x580a950 (max_nunits=1, refcnt=1) vector(16)
> unsigned char
> perm.c:18:1: note: op: VEC_PERM_EXPR
> perm.c:18:1: note:      stmt 0 _8 = MEM[(unsigned char *)s_25 + 5B];
> perm.c:18:1: note:      stmt 1 _9 = MEM[(unsigned char *)s_25 + 6B];
> perm.c:18:1: note:      stmt 2 _13 = MEM[(unsigned char *)s_25 + 7B];
> perm.c:18:1: note:      stmt 3 _13 = MEM[(unsigned char *)s_25 + 7B];
> perm.c:18:1: note:      stmt 4 _13 = MEM[(unsigned char *)s_25 + 7B];
> perm.c:18:1: note:      stmt 5 _13 = MEM[(unsigned char *)s_25 + 7B];
> perm.c:18:1: note:      stmt 6 _13 = MEM[(unsigned char *)s_25 + 7B];
> perm.c:18:1: note:      stmt 7 _13 = MEM[(unsigned char *)s_25 + 7B];
> perm.c:18:1: note:      lane permutation { 0[7] 0[8] 0[9] 0[9] 0[9] 0[9]
> 0[9] 0[9] }
> perm.c:18:1: note:      children 0x580a7a0
> $1 = void
> (rr) p debug (child)
> perm.c:18:1: note: node 0x580a7a0 (max_nunits=16, refcnt=4) vector(16)
> unsigned char
> perm.c:18:1: note: op template: _6 = MEM[(unsigned char *)s_25 + -2B];
> perm.c:18:1: note:      stmt 0 _6 = MEM[(unsigned char *)s_25 + -2B];
> perm.c:18:1: note:      stmt 1 ---
> perm.c:18:1: note:      stmt 2 ---
> perm.c:18:1: note:      stmt 3 ---
> perm.c:18:1: note:      stmt 4 ---
> perm.c:18:1: note:      stmt 5 ---
> perm.c:18:1: note:      stmt 6 _12 = MEM[(unsigned char *)s_25 + 4B];
> perm.c:18:1: note:      stmt 7 _8 = MEM[(unsigned char *)s_25 + 5B];
> perm.c:18:1: note:      stmt 8 _9 = MEM[(unsigned char *)s_25 + 6B];
> perm.c:18:1: note:      stmt 9 _13 = MEM[(unsigned char *)s_25 + 7B];
> perm.c:18:1: note:      stmt 10 _21 = MEM[(unsigned char *)s_25 + 8B];
> perm.c:18:1: note:      stmt 11 _29 = MEM[(unsigned char *)s_25 + 9B];
> perm.c:18:1: note:      stmt 12 ---
> perm.c:18:1: note:      stmt 13 ---
> perm.c:18:1: note:      stmt 14 ---
> perm.c:18:1: note:      stmt 15 ---
> 
> The code and the comments indicate to me that it was intended to support
> repeated unpacks, but not repeated packs.
> 
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index a5cd596fd28..7104835eb5a 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -10242,7 +10242,10 @@ vectorizable_slp_permutation_1 (vec_info *vinfo,
> gimple_stmt_iterator *gsi,
>        if (children.length () == 1
>           && known_eq (SLP_TREE_LANES (child) * nunits,
>                        SLP_TREE_LANES (node) * op_nunits * 2))
> -       pack_p = true;
> +       {
> +         pack_p = true;
> +         repeating_p = false;
> +       }
>        /* Check whether the output has N times as many lanes per vector.  */
>        else if (constant_multiple_p (SLP_TREE_LANES (node) * op_nunits,
>                                     SLP_TREE_LANES (child) * nunits,
> 
> fixes it, since if we're packing, we're not repeating the original vector.
> Testing the above change.

That doesn't look right.  !repeating_p is the general non-VLA case, so it
doesn't need to handle packs differently from other types of permutation.  That
is, setting repeating_p to false whenever pack_p is true should be equivalent
to setting repeating_p to false and removing the pack_p code entirely.  Both
would have the effect of removing the VLA support for the pack case.

The problem seems to be with a packing permutation:

    op0[4] op0[5] op0[6] op0[7]

and with the identity_offset parameter to vect_add_slp_permutation (added in
g:25f831eab368d1bbec4dc67bf058cb7cf6b721ee).

Both the repeating_p and !repeating_p paths correctly realise that this
permutation reduces to an identity.  But the !repeating_p path ends up with
first_node and second_node both set to the second VEC_PERM_EXPR operand (since
that path works elementwise, and since no elements are taken from the first
input).  Therefore, the call:

              vect_add_slp_permutation (vinfo, gsi, node, first_def,
                                        second_def, mask_vec, mask[0]);

works regardless of whether vect_add_slp_permutation picks first_def or
second_def.  In that sense, the parameters to vect_add_slp_permutation are
already “canonical”.

The repeating_p path instead passes vector 2N as first_def and vector 2N+1 as
second_def, with mask[0] indicating the position of the identity within the
concatenation of first_def and second_def.  However, vect_add_slp_permutation
doesn't expect this and instead ignores the identity_offset parameter.

One possible -b fix is:

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 658ad6dc257..26cf224de3b 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -11365,7 +11365,10 @@ vect_transform_slp_perm_load (vec_info *vinfo,

    otherwise add:

-      <new SSA name> = FIRST_DEF.  */
+      <new SSA name> = VEC_PERM_EXPR <FIRST_DEF, SECOND_DEF,
+                                     { N, N+1, N+2, ... }>
+
+   where N == IDENTITY_OFFSET.  */

 static void
 vect_add_slp_permutation (vec_info *vinfo, gimple_stmt_iterator *gsi,
@@ -11405,26 +11408,36 @@ vect_add_slp_permutation (vec_info *vinfo,
gimple_stmt_iterator *gsi,
                                       first_def, second_def,
                                       mask_vec);
     }
-  else if (!types_compatible_p (TREE_TYPE (first_def), vectype))
+  else
+    {
+      auto nunits = TYPE_VECTOR_SUBPARTS (vectype);
+      unsigned HOST_WIDE_INT vecno;
+      poly_uint64 eltno;
+      if (!can_div_trunc_p (poly_uint64 (identity_offset), nunits,
+                           &vecno, &eltno))
+       gcc_unreachable ();
+      tree def = vecno & 1 ? second_def : first_def;
+      if (!types_compatible_p (TREE_TYPE (def), vectype))
        {
          /* For identity permutes we still need to handle the case
             of offsetted extracts or concats.  */
          unsigned HOST_WIDE_INT c;
-      auto first_def_nunits
-       = TYPE_VECTOR_SUBPARTS (TREE_TYPE (first_def));
-      if (known_le (TYPE_VECTOR_SUBPARTS (vectype), first_def_nunits))
+         auto def_nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (def));
+         if (known_le (nunits, def_nunits))
            {
              unsigned HOST_WIDE_INT elsz
-           = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (first_def))));
-         tree lowpart = build3 (BIT_FIELD_REF, vectype, first_def,
+               = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (def))));
+             tree lowpart = build3 (BIT_FIELD_REF, vectype, def,
                                     TYPE_SIZE (vectype),
-                                bitsize_int (identity_offset * elsz));
+                                    bitsize_int (eltno * elsz));
              perm_stmt = gimple_build_assign (perm_dest, lowpart);
            }
          else if (constant_multiple_p (TYPE_VECTOR_SUBPARTS (vectype),
-                                   first_def_nunits, &c) && c == 2)
+                                       def_nunits, &c) && c == 2)
            {
-         tree ctor = build_constructor_va (vectype, 2, NULL_TREE, first_def,
+             gcc_assert (known_eq (identity_offset, 0U));
+             tree ctor = build_constructor_va (vectype, 2,
+                                               NULL_TREE, first_def,
                                                NULL_TREE, second_def);
              perm_stmt = gimple_build_assign (perm_dest, ctor);
            }
@@ -11434,7 +11447,9 @@ vect_add_slp_permutation (vec_info *vinfo,
gimple_stmt_iterator *gsi,
       else
        {
          /* We need a copy here in case the def was external.  */
-      perm_stmt = gimple_build_assign (perm_dest, first_def);
+         gcc_assert (known_eq (eltno, 0U));
+         perm_stmt = gimple_build_assign (perm_dest, def);
+       }
     }
   vect_finish_stmt_generation (vinfo, NULL, perm_stmt, gsi);
   /* Store the vector statement in NODE.  */

which seems to be enough for the testcase.  I've attached an untested patch for
that (no testcase).

I'm not really set up to test GCC these days, so someone please take it from
here if the patch looks right.

Reply via email to