This enables use of a predicate mask or length limit for
vectorization of basic blocks in cases where previously only the
equivalent rolled (i.e. loop) form of some source code would have
been vectorized. Predication is only used for groups whose size
is not neatly divisible into vectors of lengths that can be
supported directly by the target.

The whole change enabled by wiring up vect_can_use_partial_vectors_p
to SLP_TREE_CAN_USE_PARTIAL_VECTORS_P, so that when this function is
used for BB SLP, it reads from the SLP node instead of from the loop
vectorizer's state whether we still have the option of vectorizing
using lengths or masks to prevent use of inactive lanes.

vect_record_max_nunits is updated to prevent it returning failure
for BB SLP if the group size is not an integral multiple of the
number of lanes in the vector type; it now allows such cases if
the vector type might be more than long enough. At the same time,
vect_get_num_vectors is updated to return early with 1 if a
vector type is long enough for the specified SLP tree node. This
avoids an ICE in vect_get_num_vectors, which cannot cope with SVE
vector types.

Instead of giving up if vect_get_vector_types_for_stmt
fails for the specified group size, vect_build_slp_tree_1
now calls vect_get_vector_types_for_stmt again without
a group size (which defaults to 0) as a fallback.
If this succeeds then the initial failure is treated as a
'soft' failure that results in the group being split.
Consequently, assertions that "For BB vectorization, we
should always have a group size once we've constructed the
SLP tree" were deleted in get_vectype_for_scalar_type and
vect_get_vector_types_for_stmt.

vect_create_vectorized_promotion_stmts no longer pushes
more stmts than implied by vect_get_num_copies because it could
previously overrun the number of slots allocated for an SLP node
(based on its number of lanes and type). e.g., four defs were
pushed for a promotion of V8HI to V2DI (8/2=4) even if only two
lanes of the V8HI were active. Allowing it later caused ICE in
vectorizable_operation for a parent node, because binary ops
require both operands to be the same length.

Since promotion no longer produces redundant definitions,
vectorizable_conversion also had to be modified so that demotion no
longer relies on an even number of defs being produced. If
necessary, it now pushes a single constant zero def.

Update expectations for gcc.target/aarch64/popcnt-sve.c

Update expectations for gcc.dg/vect/vect-over-widen-*.c
---
 .../gcc.dg/vect/vect-over-widen-10.c          |   2 +-
 .../gcc.dg/vect/vect-over-widen-13.c          |   2 +-
 .../gcc.dg/vect/vect-over-widen-14.c          |   2 +-
 .../gcc.dg/vect/vect-over-widen-17.c          |   2 +-
 .../gcc.dg/vect/vect-over-widen-18.c          |   2 +-
 gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c |   2 +-
 gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c |   2 +-
 gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c |   2 +-
 gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c |   2 +-
 gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c |   2 +-
 gcc/testsuite/gcc.target/aarch64/popcnt-sve.c |  10 +-
 gcc/tree-vect-slp.cc                          |  90 +++++++--
 gcc/tree-vect-stmts.cc                        | 191 +++++++++++++-----
 gcc/tree-vectorizer.h                         |  11 +-
 14 files changed, 226 insertions(+), 96 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c 
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c
index f0140e4ef6d..6efcf739db9 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c
@@ -16,5 +16,5 @@
 /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
detected:[^\n]* >> 1} "vect" } } */
 /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
detected:[^\n]* >> 2} "vect" } } */
 /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: 
detected:[^\n]* \(unsigned char\)} "vect" } } */
-/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int vect__} "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c 
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
index 08a65ea5518..720353716cf 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
@@ -48,5 +48,5 @@ main (void)
 /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
detected:[^\n]* \+} "vect" } } */
 /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
detected:[^\n]* / 2} "vect" } } */
 /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: 
detected:[^\n]* = \(signed char\)} "vect" } } */
-/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int vect__} "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c 
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c
index dfa09f5d2ca..f1d5f95c543 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c
@@ -15,5 +15,5 @@
 /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
detected:[^\n]* \+} "vect" } } */
 /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
detected:[^\n]* >> 1} "vect" } } */
 /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: 
detected:[^\n]* = \(unsigned char\)} "vect" } } */
-/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int vect__} "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c 
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
index 53fcfd0c06c..ac1a0f86727 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
@@ -46,5 +46,5 @@ main (void)
    adopts realign_load scheme.  It requires rs6000_builtin_mask_for_load to
    generate mask whose return type is vector char.  */
 /* { dg-final { scan-tree-dump-not {vector[^\n]*char} "vect" { target 
vect_hw_misalign } } } */
-/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int vect__} "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c 
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
index aa58cd1c957..3ebfaa78270 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
@@ -47,5 +47,5 @@ main (void)
 /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
detected:[^\n]* |} "vect" } } */
 /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
detected:[^\n]* <<} "vect" } } */
 /* { dg-final { scan-tree-dump {vector[^\n]*char} "vect" } } */
-/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int vect__} "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c 
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
index c2ab11a9d32..1d89789a86d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
@@ -49,5 +49,5 @@ main (void)
 /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
detected:[^\n]* \+ } "vect" } } */
 /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
detected:[^\n]* >> 1} "vect" } } */
 /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: 
detected:[^\n]* \(signed char\)} "vect" } } */
-/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int vect__} "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c 
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c
index bda92c965e0..62d5a52587e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c
@@ -13,5 +13,5 @@
 /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
detected:[^\n]* \+ } "vect" } } */
 /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
detected:[^\n]* >> 1} "vect" } } */
 /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: 
detected:[^\n]* \(unsigned char\)} "vect" } } */
-/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int vect__} "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c 
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
index 1d55e13fb1f..6e09631009a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
@@ -51,5 +51,5 @@ main (void)
 /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
detected:[^\n]* \+ } "vect" } } */
 /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
detected:[^\n]* >> 2} "vect" } } */
 /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: 
detected:[^\n]* \(signed char\)} "vect" } } */
-/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int vect__} "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c 
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c
index 553c0712a79..b6d650beab4 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c
@@ -16,5 +16,5 @@
 /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
detected:[^\n]* \+ } "vect" } } */
 /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
detected:[^\n]* >> 2} "vect" } } */
 /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: 
detected:[^\n]* \(unsigned char\)} "vect" } } */
-/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int vect__} "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c 
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
index 36bfc68e053..e82f8a571da 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
@@ -56,5 +56,5 @@ main (void)
 /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
detected:[^\n]* >> 1} "vect" } } */
 /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
detected:[^\n]* >> 2} "vect" } } */
 /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: 
detected:[^\n]* \(signed char\)} "vect" } } */
-/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int vect__} "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt-sve.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt-sve.c
index c3b4c69b4b4..8e349efe390 100644
--- a/gcc/testsuite/gcc.target/aarch64/popcnt-sve.c
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt-sve.c
@@ -4,7 +4,7 @@
 
 /*
 ** f_v4hi:
-**     ptrue   (p[0-7]).b, vl8
+**     ptrue   (p[0-7]).b, all
 **     ldr     d([0-9]+), \[x0\]
 **     cnt     z\2.h, \1/m, z\2.h
 **     str     d\2, \[x1\]
@@ -21,7 +21,7 @@ f_v4hi (unsigned short *__restrict b, unsigned short 
*__restrict d)
 
 /*
 ** f_v8hi:
-**     ptrue   (p[0-7]).b, vl16
+**     ptrue   (p[0-7]).b, all
 **     ldr     q([0-9]+), \[x0\]
 **     cnt     z\2.h, \1/m, z\2.h
 **     str     q\2, \[x1\]
@@ -42,7 +42,7 @@ f_v8hi (unsigned short *__restrict b, unsigned short 
*__restrict d)
 
 /*
 ** f_v2si:
-**     ptrue   (p[0-7]).b, vl8
+**     ptrue   (p[0-7]).b, all
 **     ldr     d([0-9]+), \[x0\]
 **     cnt     z\2.s, \1/m, z\2.s
 **     str     d\2, \[x1\]
@@ -57,7 +57,7 @@ f_v2si (unsigned int *__restrict b, unsigned int *__restrict 
d)
 
 /*
 ** f_v4si:
-**     ptrue   (p[0-7]).b, vl16
+**     ptrue   (p[0-7]).b, all
 **     ldr     q([0-9]+), \[x0\]
 **     cnt     z\2.s, \1/m, z\2.s
 **     str     q\2, \[x1\]
@@ -74,7 +74,7 @@ f_v4si (unsigned int *__restrict b, unsigned int *__restrict 
d)
 
 /*
 ** f_v2di:
-**     ptrue   (p[0-7]).b, vl16
+**     ptrue   (p[0-7]).b, all
 **     ldr     q([0-9]+), \[x0\]
 **     cnt     z\2.d, \1/m, z\2.d
 **     str     q\2, \[x1\]
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 4e5f6bc8083..f5cf437014e 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1074,8 +1074,12 @@ vect_record_nunits (vec_info *vinfo, stmt_vec_info 
stmt_info,
     }
 
   /* If populating the vector type requires unrolling then fail
-     before adjusting *nunits for basic-block vectorization.  */
+     before adjusting *nunits for basic-block vectorization.
+     Allow group sizes that are indivisible by the vector length only if they
+     are known not to exceed the vector length.  We may be able to support such
+     cases by generating constant masks.  */
   if (is_a <bb_vec_info> (vinfo)
+      && maybe_gt (group_size, TYPE_VECTOR_SUBPARTS (vectype))
       && !multiple_p (group_size, TYPE_VECTOR_SUBPARTS (vectype)))
     {
       if (dump_enabled_p ())
@@ -1127,12 +1131,29 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
   tree soft_fail_nunits_vectype = NULL_TREE;
 
   tree vectype, nunits_vectype;
+  bool unsupported_datatype = false;
   if (!vect_get_vector_types_for_stmt (vinfo, first_stmt_info, &vectype,
-                                      &nunits_vectype, group_size))
+                                      &nunits_vectype, &unsupported_datatype,
+                                      group_size))
     {
-      /* Fatal mismatch.  */
-      matches[0] = false;
-      return false;
+      /* Try to get fallback vector types and continue analysis, producing
+        matches[] as if vectype was not an issue.  This allows splitting of
+        groups to happen.  */
+      if (unsupported_datatype
+         && vect_get_vector_types_for_stmt (vinfo, first_stmt_info, &vectype,
+                                            &nunits_vectype,
+                                            &unsupported_datatype))
+       {
+         gcc_assert (is_a<bb_vec_info> (vinfo));
+         maybe_soft_fail = true;
+         soft_fail_nunits_vectype = nunits_vectype;
+       }
+      else
+       {
+         /* Fatal mismatch.  */
+         matches[0] = false;
+         return false;
+       }
     }
   if (is_a <bb_vec_info> (vinfo)
       && known_le (TYPE_VECTOR_SUBPARTS (vectype), 1U))
@@ -1656,16 +1677,22 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
 
   if (maybe_soft_fail)
     {
-      unsigned HOST_WIDE_INT const_nunits;
-      if (!TYPE_VECTOR_SUBPARTS
-           (soft_fail_nunits_vectype).is_constant (&const_nunits)
-         || const_nunits > group_size)
+      /* Use the known minimum number of subparts for VLA because we still need
+        to choose a splitting point although the choice is more arbitrary.  */
+      unsigned HOST_WIDE_INT const_nunits = constant_lower_bound (
+         TYPE_VECTOR_SUBPARTS (soft_fail_nunits_vectype));
+
+      if (const_nunits > group_size)
        matches[0] = false;
       else
        {
          /* With constant vector elements simulate a mismatch at the
             point we need to split.  */
+         gcc_assert ((const_nunits % 2) == 0);
          unsigned tail = group_size & (const_nunits - 1);
+         if (tail == 0)
+           tail = const_nunits;
+         gcc_assert (group_size >= tail);
          memset (&matches[group_size - tail], 0, sizeof (bool) * tail);
        }
       return false;
@@ -2392,13 +2419,21 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
                  /* Check whether we can build the invariant.  If we can't
                     we never will be able to.  */
                  tree type = TREE_TYPE (chains[0][n].op);
-                 if (!GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
-                     && (TREE_CODE (type) == BOOLEAN_TYPE
-                         || !can_duplicate_and_interleave_p (vinfo, group_size,
-                                                             type)))
+                 if (!GET_MODE_SIZE (vinfo->vector_mode).is_constant ())
                    {
-                     matches[0] = false;
-                     goto out;
+                     if (TREE_CODE (type) == BOOLEAN_TYPE)
+                       {
+                         matches[0] = false;
+                         goto out;
+                       }
+
+                     if (!is_a<bb_vec_info> (vinfo)
+                         && !can_duplicate_and_interleave_p (vinfo, group_size,
+                                                             type))
+                       {
+                         matches[0] = false;
+                         goto out;
+                       }
                    }
                }
              else if (dt != vect_internal_def)
@@ -2827,7 +2862,7 @@ out:
                    uniform_val = NULL_TREE;
                    break;
                  }
-             if (!uniform_val
+             if (!uniform_val && !is_a<bb_vec_info> (vinfo)
                  && !can_duplicate_and_interleave_p (vinfo,
                                                      oprnd_info->ops.length (),
                                                      TREE_TYPE (op0)))
@@ -4875,9 +4910,10 @@ vect_analyze_slp_instance (vec_info *vinfo,
            = TREE_TYPE (DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
          tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type,
                                                      1 << floor_log2 (i));
-         unsigned HOST_WIDE_INT const_nunits;
-         if (vectype
-             && TYPE_VECTOR_SUBPARTS (vectype).is_constant (&const_nunits))
+         unsigned HOST_WIDE_INT const_nunits
+           = vectype ? constant_lower_bound (TYPE_VECTOR_SUBPARTS (vectype))
+                     : 0;
+         if (const_nunits > 1 && (i % const_nunits) == 0)
            {
              /* Split into two groups at the first vector boundary.  */
              gcc_assert ((const_nunits & (const_nunits - 1)) == 0);
@@ -11540,7 +11576,21 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
       unpack_factor = 1;
     }
   unsigned olanes = unpack_factor * ncopies * SLP_TREE_LANES (node);
-  gcc_assert (repeating_p || multiple_p (olanes, nunits));
+
+  /* With fully-predicated BB-SLP, an external node's number of lanes can be
+     incompatible with the chosen vector width (e.g., lane packs of 3 with a
+     natural 2-lane vector type).  */
+  if (!repeating_p && !multiple_p (olanes, nunits))
+    {
+      if (dump_p)
+       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                        "unsupported permutation %p: vector type %T,"
+                        " nunits=" HOST_WIDE_INT_PRINT_UNSIGNED
+                        " ncopies=%" PRIu64 ", lanes=%u and unpack=%u\n",
+                        (void *) node, vectype, estimated_poly_value (nunits),
+                        ncopies, SLP_TREE_LANES (node), unpack_factor);
+      return -1;
+    }
 
   /* Compute the { { SLP operand, vector index}, lane } permutation sequence
      from the { SLP operand, scalar lane } permutation as recorded in the
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 10cfe0dc06d..ddf2f90e8d3 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1687,23 +1687,27 @@ check_load_store_for_partial_vectors (vec_info *vinfo, 
tree vectype,
     unsigned int nvectors;
     if (can_div_away_from_zero_p (size, nunits, &nvectors))
       return nvectors;
-    gcc_unreachable ();
+
+    gcc_assert (known_le (size, nunits));
+    return 1u;
   };
 
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
-  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  poly_uint64 size = loop_vinfo
+                      ? group_size * LOOP_VINFO_VECT_FACTOR (loop_vinfo)
+                      : SLP_TREE_LANES (slp_node);
   unsigned factor;
   vect_partial_vector_style partial_vector_style
     = vect_get_partial_vector_style (vectype, is_load, &factor, elsvals);
 
   if (partial_vector_style == vect_partial_vectors_len)
     {
-      nvectors = group_memory_nvectors (group_size * vf, nunits);
+      nvectors = group_memory_nvectors (size, nunits);
       vect_record_len (vinfo, slp_node, nvectors, vectype, factor);
     }
   else if (partial_vector_style == vect_partial_vectors_while_ult)
     {
-      nvectors = group_memory_nvectors (group_size * vf, nunits);
+      nvectors = group_memory_nvectors (size, nunits);
       vect_record_mask (vinfo, slp_node, nvectors, vectype, scalar_mask);
     }
   else
@@ -3305,12 +3309,11 @@ vect_get_strided_load_store_ops (stmt_vec_info 
stmt_info, slp_tree node,
 
 static tree
 vect_get_loop_variant_data_ptr_increment (
-  vec_info *vinfo, tree aggr_type, gimple_stmt_iterator *gsi,
+  loop_vec_info loop_vinfo, tree aggr_type, gimple_stmt_iterator *gsi,
   vec_loop_lens *loop_lens, dr_vec_info *dr_info,
   vect_memory_access_type memory_access_type)
 {
-  loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
-  tree step = vect_dr_behavior (vinfo, dr_info)->step;
+  tree step = vect_dr_behavior (loop_vinfo, dr_info)->step;
 
   /* gather/scatter never reach here.  */
   gcc_assert (!mat_gather_scatter_p (memory_access_type));
@@ -3354,7 +3357,7 @@ vect_get_data_ptr_increment (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
 
   loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
   if (loop_vinfo && LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
-    return vect_get_loop_variant_data_ptr_increment (vinfo, aggr_type, gsi,
+    return vect_get_loop_variant_data_ptr_increment (loop_vinfo, aggr_type, 
gsi,
                                                     loop_lens, dr_info,
                                                     memory_access_type);
 
@@ -5167,7 +5170,7 @@ vect_create_vectorized_demotion_stmts (vec_info *vinfo, 
vec<tree> *vec_oprnds,
    call the function recursively.  */
 
 static void
-vect_create_vectorized_promotion_stmts (vec_info *vinfo,
+vect_create_vectorized_promotion_stmts (vec_info *vinfo, slp_tree slp_node,
                                        vec<tree> *vec_oprnds0,
                                        vec<tree> *vec_oprnds1,
                                        stmt_vec_info stmt_info, tree vec_dest,
@@ -5180,37 +5183,39 @@ vect_create_vectorized_promotion_stmts (vec_info *vinfo,
   gimple *new_stmt1, *new_stmt2;
   vec<tree> vec_tmp = vNULL;
 
-  vec_tmp.create (vec_oprnds0->length () * 2);
+  const unsigned ncopies = vect_get_num_copies (vinfo, slp_node);
+  vec_tmp.create (ncopies);
+  gcc_assert (vec_oprnds0->length () <= ncopies);
   FOR_EACH_VEC_ELT (*vec_oprnds0, i, vop0)
     {
+      if (vec_tmp.length () >= ncopies)
+       break;
+
       if (op_type == binary_op)
        vop1 = (*vec_oprnds1)[i];
       else
        vop1 = NULL_TREE;
 
       /* Generate the two halves of promotion operation.  */
-      new_stmt1 = vect_gen_widened_results_half (vinfo, ch1, vop0, vop1,
-                                                op_type, vec_dest, gsi,
-                                                stmt_info);
-      new_stmt2 = vect_gen_widened_results_half (vinfo, ch2, vop0, vop1,
-                                                op_type, vec_dest, gsi,
-                                                stmt_info);
-      if (is_gimple_call (new_stmt1))
-       {
-         new_tmp1 = gimple_call_lhs (new_stmt1);
-         new_tmp2 = gimple_call_lhs (new_stmt2);
-       }
-      else
+      new_stmt1
+       = vect_gen_widened_results_half (vinfo, ch1, vop0, vop1, op_type,
+                                        vec_dest, gsi, stmt_info);
+      new_tmp1 = is_gimple_call (new_stmt1) ? gimple_call_lhs (new_stmt1)
+                                           : gimple_assign_lhs (new_stmt1);
+      vec_tmp.quick_push (new_tmp1);
+
+      if (vec_tmp.length () < ncopies)
        {
-         new_tmp1 = gimple_assign_lhs (new_stmt1);
-         new_tmp2 = gimple_assign_lhs (new_stmt2);
+         new_stmt2
+           = vect_gen_widened_results_half (vinfo, ch2, vop0, vop1, op_type,
+                                            vec_dest, gsi, stmt_info);
+         new_tmp2 = is_gimple_call (new_stmt2) ? gimple_call_lhs (new_stmt2)
+                                               : gimple_assign_lhs (new_stmt2);
+         vec_tmp.quick_push (new_tmp2);
        }
-
-      /* Store the results for the next step.  */
-      vec_tmp.quick_push (new_tmp1);
-      vec_tmp.quick_push (new_tmp2);
     }
 
+  gcc_assert (vec_tmp.length () <= ncopies);
   vec_oprnds0->release ();
   *vec_oprnds0 = vec_tmp;
 }
@@ -5425,6 +5430,7 @@ vectorizable_conversion (vec_info *vinfo,
      from the scalar type.  */
   if (!vectype_in)
     vectype_in = get_vectype_for_scalar_type (vinfo, rhs_type, slp_node);
+
   if (!cost_vec)
     gcc_assert (vectype_in);
   if (!vectype_in)
@@ -5812,12 +5818,15 @@ vectorizable_conversion (vec_info *vinfo,
                                             stmt_info, this_dest, gsi, c1,
                                             op_type);
          else
-           vect_create_vectorized_promotion_stmts (vinfo, &vec_oprnds0,
-                                                   &vec_oprnds1, stmt_info,
-                                                   this_dest, gsi,
+           vect_create_vectorized_promotion_stmts (vinfo, slp_node,
+                                                   &vec_oprnds0, &vec_oprnds1,
+                                                   stmt_info, this_dest, gsi,
                                                    c1, c2, op_type);
        }
 
+      gcc_assert (vec_oprnds0.length ()
+                 == vect_get_num_copies (vinfo, slp_node));
+
       FOR_EACH_VEC_ELT (vec_oprnds0, i, vop0)
        {
          gimple *new_stmt;
@@ -5841,6 +5850,16 @@ vectorizable_conversion (vec_info *vinfo,
         generate more than one vector stmt - i.e - we need to "unroll"
         the vector stmt by a factor VF/nunits.  */
       vect_get_vec_defs (vinfo, slp_node, op0, &vec_oprnds0);
+
+      /* Promotion no longer produces redundant defs (since support was
+       added for length/mask-predicated BB SLP of awkward-sized groups),
+       therefore demotion now has to handle that case too.  */
+      if (vec_oprnds0.length () % 2 != 0)
+       {
+         tree vectype = TREE_TYPE (vec_oprnds0[0]);
+         vec_oprnds0.safe_push (build_zero_cst (vectype));
+       }
+
       /* Arguments are ready.  Create the new vector stmts.  */
       if (cvt_type && modifier == NARROW_DST)
        FOR_EACH_VEC_ELT (vec_oprnds0, i, vop0)
@@ -5859,6 +5878,8 @@ vectorizable_conversion (vec_info *vinfo,
       /* After demoting op0 to cvt_type, convert it to dest.  */
       if (cvt_type && code == FLOAT_EXPR)
        {
+         SLP_TREE_VEC_DEFS (slp_node).reserve (vec_oprnds0.length ());
+
          for (unsigned int i = 0; i != vec_oprnds0.length() / 2;  i++)
            {
              /* Arguments are ready, create the new vector stmt.  */
@@ -10493,7 +10514,7 @@ vectorizable_load (vec_info *vinfo,
 
       aggr_type = build_array_type_nelts (elem_type, group_size * nunits);
       if (!costing_p)
-       bump = vect_get_data_ptr_increment (vinfo, gsi, dr_info, aggr_type,
+       bump = vect_get_data_ptr_increment (loop_vinfo, gsi, dr_info, aggr_type,
                                            memory_access_type, loop_lens);
 
       unsigned int inside_cost = 0, prologue_cost = 0;
@@ -12997,6 +13018,21 @@ vect_analyze_stmt (vec_info *vinfo,
                                   " live stmt not supported: %G",
                                   stmt_info->stmt);
 
+  if (bb_vinfo)
+    {
+      unsigned int group_size = SLP_TREE_LANES (node);
+      tree vectype = SLP_TREE_VECTYPE (node);
+      poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+      bool needs_partial = known_lt (group_size, nunits);
+      if (needs_partial && !SLP_TREE_CAN_USE_PARTIAL_VECTORS_P (node))
+       return opt_result::failure_at (stmt_info->stmt,
+                                      "not vectorized: SLP node needs but "
+                                      "cannot use partial vectors: %G",
+                                      stmt_info->stmt);
+      if (maybe_gt (group_size, nunits))
+       gcc_assert (multiple_p (group_size, nunits));
+    }
+
   return opt_result::success ();
 }
 
@@ -13299,13 +13335,7 @@ tree
 get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type,
                             unsigned int group_size)
 {
-  /* For BB vectorization, we should always have a group size once we've
-     constructed the SLP tree; the only valid uses of zero GROUP_SIZEs
-     are tentative requests during things like early data reference
-     analysis and pattern recognition.  */
-  if (is_a <bb_vec_info> (vinfo))
-    gcc_assert (vinfo->slp_instances.is_empty () || group_size != 0);
-  else
+  if (!is_a <bb_vec_info> (vinfo))
     group_size = 0;
 
   tree vectype = get_related_vectype_for_scalar_type (vinfo->vector_mode,
@@ -13319,10 +13349,18 @@ get_vectype_for_scalar_type (vec_info *vinfo, tree 
scalar_type,
     vinfo->used_vector_modes.add (TYPE_MODE (vectype));
 
   /* If the natural choice of vector type doesn't satisfy GROUP_SIZE,
-     try again with an explicit number of elements.  */
-  if (vectype
-      && group_size
-      && maybe_ge (TYPE_VECTOR_SUBPARTS (vectype), group_size))
+     try again with an explicit number of elements.  A vector type satisfies
+     GROUP_SIZE if it is definitely not too long to store the whole group,
+     or we are able to generate masks to handle the unknown number of excess
+     lanes that might exist. Otherwise, we must substitute a vector type that
+     can be used to carve up the group.
+   */
+  if (vectype && group_size
+      && maybe_gt (TYPE_VECTOR_SUBPARTS (vectype), group_size)
+      && (vect_get_partial_vector_style (vectype, true)
+           == vect_partial_vectors_none
+         || vect_get_partial_vector_style (vectype, false)
+              == vect_partial_vectors_none))
     {
       /* Start with the biggest number of units that fits within
         GROUP_SIZE and halve it until we find a valid vector type.
@@ -13613,6 +13651,13 @@ vect_is_simple_use (vec_info *vinfo, slp_tree slp_node,
     {
       if (def_stmt_info_out)
        *def_stmt_info_out = NULL;
+      if (SLP_TREE_SCALAR_OPS (child).is_empty ())
+       {
+         if (dump_enabled_p ())
+           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                            "Child has no scalar operands.\n");
+         return false;
+       }
       *op = SLP_TREE_SCALAR_OPS (child)[0];
       *dt = SLP_TREE_DEF_TYPE (child);
       return true;
@@ -13638,7 +13683,36 @@ vect_maybe_update_slp_op_vectype (vec_info *vinfo, 
slp_tree op, tree vectype)
       && SLP_TREE_DEF_TYPE (op) == vect_external_def
       && SLP_TREE_LANES (op) > 1)
     return false;
-  (void) vinfo; // FORNOW
+
+  /* When the vectorizer falls back to building vector operands from scalars,
+     it can create SLP trees with external defs that have a number of lanes not
+     divisible by the number of subparts in a vector type naively inferred from
+     the scalar type.  Reject such types to avoid ICE when later computing the
+     prologue cost for invariant operands.  */
+  if (SLP_TREE_DEF_TYPE (op) == vect_external_def)
+    {
+      poly_uint64 vf = 1;
+
+      if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
+       vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+
+      vf *= SLP_TREE_LANES (op);
+
+      if (maybe_lt (TYPE_VECTOR_SUBPARTS (vectype), vf)
+         && !multiple_p (vf, TYPE_VECTOR_SUBPARTS (vectype)))
+       {
+         if (dump_enabled_p ())
+           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                            "lanes=" HOST_WIDE_INT_PRINT_UNSIGNED
+                            " is not divisible by "
+                            "subparts=" HOST_WIDE_INT_PRINT_UNSIGNED ".\n",
+                            estimated_poly_value (vf),
+                            estimated_poly_value (
+                              TYPE_VECTOR_SUBPARTS (vectype)));
+         return false;
+       }
+    }
+
   SLP_TREE_VECTYPE (op) = vectype;
   return true;
 }
@@ -14360,23 +14434,25 @@ vect_gen_while_not (gimple_seq *seq, tree mask_type, 
tree start_index,
 
    - Set *NUNITS_VECTYPE_OUT to the vector type that contains the maximum
      number of units needed to vectorize STMT_INFO, or NULL_TREE if the
-     statement does not help to determine the overall number of units.  */
+     statement does not help to determine the overall number of units.
+
+   On failure:
+
+   - Set *UNSUPPORTED_DATATYPE to true if the statement can't be vectorized
+     because it uses a data type that the target doesn't support in vector form
+     for a group of the given GROUP_SIZE.
+ */
 
 opt_result
 vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
                                tree *stmt_vectype_out,
                                tree *nunits_vectype_out,
+                               bool *unsupported_datatype,
                                unsigned int group_size)
 {
   gimple *stmt = stmt_info->stmt;
 
-  /* For BB vectorization, we should always have a group size once we've
-     constructed the SLP tree; the only valid uses of zero GROUP_SIZEs
-     are tentative requests during things like early data reference
-     analysis and pattern recognition.  */
-  if (is_a <bb_vec_info> (vinfo))
-    gcc_assert (vinfo->slp_instances.is_empty () || group_size != 0);
-  else
+  if (!is_a<bb_vec_info> (vinfo))
     group_size = 0;
 
   *stmt_vectype_out = NULL_TREE;
@@ -14453,10 +14529,13 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, 
stmt_vec_info stmt_info,
        }
       vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
       if (!vectype)
-       return opt_result::failure_at (stmt,
-                                      "not vectorized:"
-                                      " unsupported data-type %T\n",
-                                      scalar_type);
+       {
+         *unsupported_datatype = true;
+         return opt_result::failure_at (stmt,
+                                        "not vectorized:"
+                                        " unsupported data-type %T\n",
+                                        scalar_type);
+       }
 
       if (dump_enabled_p ())
        dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index ab9ed09d62b..726b86de8ad 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2325,6 +2325,8 @@ vect_get_num_copies (vec_info *vinfo, slp_tree node)
 
   vf *= SLP_TREE_LANES (node);
   tree vectype = SLP_TREE_VECTYPE (node);
+  if (known_ge (TYPE_VECTOR_SUBPARTS (vectype), vf))
+    return 1;
 
   return vect_get_num_vectors (vf, vectype);
 }
@@ -2623,9 +2625,9 @@ extern tree vect_gen_while (gimple_seq *, tree, tree, 
tree,
                            const char * = nullptr);
 extern void vect_gen_while_ssa_name (gimple_seq *, tree, tree, tree, tree);
 extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
-extern opt_result vect_get_vector_types_for_stmt (vec_info *,
-                                                 stmt_vec_info, tree *,
-                                                 tree *, unsigned int = 0);
+extern opt_result vect_get_vector_types_for_stmt (vec_info *, stmt_vec_info,
+                                                 tree *, tree *,
+                                                 bool *, unsigned int = 0);
 extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info, unsigned int = 0);
 
 /* In tree-if-conv.cc.  */
@@ -2958,8 +2960,7 @@ vect_can_use_partial_vectors_p (vec_info *vinfo, slp_tree 
slp_node)
   if (loop_vinfo)
     return LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo);
 
-  (void) slp_node; // FORNOW
-  return false;
+  return SLP_TREE_CAN_USE_PARTIAL_VECTORS_P (slp_node);
 }
 
 /* If VINFO is vectorizer state for loop vectorization then record that we no
-- 
2.43.0

Reply via email to