This enables use of a predicate mask or length limit for
vectorization of basic blocks in cases where previously only the
equivalent rolled (i.e. loop) form of some source code would have
been vectorized. Predication is used for groups whose size
is not neatly divisible into vectors of lengths that can be
supported directly by the target.
The vect_record_nunits function used during building of an SLP
tree is updated to prevent it returning failure for BB SLP if the
group size is not an integral multiple of the number of lanes in the
vector type; it now allows such cases if the vector type might be more
than long enough.
Instead of giving up if vect_get_vector_types_for_stmt
fails for the specified group size, vect_build_slp_tree_1
now calls vect_get_vector_types_for_stmt again without
a group size (which defaults to 0) as a fallback.
If this succeeds then the initial failure is treated as a
'soft' failure that results in the group being split.
Consequently, assertions that "For BB vectorization, we
should always have a group size once we've constructed the
SLP tree" were deleted in get_vectype_for_scalar_type and
vect_get_vector_types_for_stmt.
For BB SLP, vect_analyze_slp_instance previously gave up after
building an SLP tree if it could not prove that the group size was
at least the maximum lane count across all of the vector types in
the SLP tree (which is unprovable for scalable vector types), or
attempted to split the group if it could prove that the group size
was greater than this maximum but not exactly divisible by it
(which is also unprovable for scalable vector types).
This function will now provisionally create a new SLP instance if the
group size definitely does not exceed the minimum number of lanes,
even if the group size otherwise satisfies conditions that would
require a loop to be unrolled (e.g., a group of size 3 that uses a
mixture of V4SI and V8HI types). If the group size lies between the
minimum and maximum number of lanes then vectorization is still
abandoned (e.g., a group of size 3 that uses a mixture of
V2DI and V4SI types).
With BB SLP, there is no need for agreement between different SLP
nodes about whether to use masks or lengths to support partial vectors.
Instead, that decision is made early and per individual SLP node, by
vect_analyze_stmt. If a partial vector is required (i.e. if the number
of subparts in the vector type may be greater than the number of active
lanes for the node) then vect_analyze_stmt now requires
SLP_TREE_CAN_USE_PARTIAL_VECTORS_P to be true; otherwise it clears any
SLP_TREE_PARTIAL_VECTORS_STYLE that could have been set.
The vect_get_num_copies function used during statement analysis
is updated to return early with 1 if a vector type is long enough for
the specified SLP tree node. This avoids an ICE in vect_get_num_vectors,
which cannot cope with SVE vector types.
vect_create_vectorized_promotion_stmts no longer pushes
more stmts than implied by vect_get_num_copies because it could
previously overrun the number of slots allocated for an SLP node
(based on its number of lanes and type). e.g., four defs were
pushed for a promotion of V8HI to V2DI (8/2=4) even if only two
lanes of the V8HI were active. Allowing it later caused ICE in
vectorizable_operation for a parent node, because binary ops
require both operands to be the same length.
Since promotion no longer produces redundant definitions,
vectorizable_conversion also had to be modified so that demotion no
longer relies on an even number of defs being produced. If
necessary, it now pushes a single constant zero def.
The whole change is enabled by wiring the wrapper function
vect_can_use_partial_vectors_p to SLP_TREE_CAN_USE_PARTIAL_VECTORS_P
when invoked for BB SLP vectorization.
Update expectations for gcc.dg/vect/vect-over-widen-*.c
gcc/ChangeLog:
* tree-vect-slp.cc (vect_record_nunits): Allow group sizes that
are indivisible by the vector length.
(vect_build_slp_tree_1): In case of failure of
vect_get_vector_types_for_stmt, try to get fallback vector
types and continue analysis to allow splitting of groups.
(vect_build_slp_tree_2): Don't call
can_duplicate_and_interleave_p when doing basic block SLP
vectorization.
(vect_analyze_slp_instance): For BB SLP vectorization, create
a new SLP instance if the group size definitely does not exceed
the minimum number of lanes, even if the group size otherwise
satisfies conditions that would require a loop to be unrolled.
(vectorizable_slp_permutation_1): Instead of asserting that an
SLP tree node's number of lanes is compatible with the chosen
vector width, return a failure indication if incompatible.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors):
When calculating the number of vectors, get the group size from
SLP_TREE_LANES instead of a parameter (e.g., DR_GROUP_SIZE) if
doing BB SLP vectorization. Don't assume it can be divided by
the number of subparts in the vector type to get a compile-time
constant.
(vect_get_data_ptr_increment): Require a parameter of type
loop_vec_info instead of vec_info *.
(vect_create_vectorized_promotion_stmts): Require an SLP tree
node to be passed by the caller, for use by
vect_get_num_copies.
Stop pushing more stmts than implied by vect_get_num_copies.
(vectorizable_conversion): Pass SLP tree node to
vect_create_vectorized_promotion_stmts.
Demotion no longer relies on an even number of definitions
being produced by promotion. If necessary, push a single constant
zero definition.
(vectorizable_load): Pass loop_vec_info instead of vec_info *
when calling vect_get_data_ptr_increment.
(vect_analyze_stmt): For BB SLP vectorization, check whether
the group needs partial vectors. If it does then return a
failure indication if SLP_TREE_CAN_USE_PARTIAL_VECTORS_P was
cleared by a callee of this function; if it doesn't need
partial vectors then clear any partial vectors style that might
have been chosen by callees of this function.
(get_vectype_for_scalar_type): For BB SLP vectorization, allow
invocation of this function with a group size of zero even if
one or more SLP instances have been created.
If the number of subparts in the natural choice of vector type
could be greater than the group size then pick a shorter vector
type only if the target does not support partial vectors.
(vect_maybe_update_slp_op_vectype): Reject external definitions
that have a number of lanes not divisible by the number of
subparts in a vector type naively inferred from the scalar
type.
(vect_get_vector_types_for_stmt): Add a new output parameter of
Boolean type. Set it to true if the statement can't be
vectorized because it uses a data type that the target doesn't
support in vector form for a group of the given size, otherwise
false.
* tree-vectorizer.h (vect_get_num_copies): Return early with 1
if a vector type is long enough for the specified SLP tree
node to avoid an ICE in vect_get_num_vectors.
(vect_get_vector_types_for_stmt): Update function declaration.
(vect_can_use_partial_vectors_p): Handle the BB SLP use-case by
returning the result of SLP_TREE_CAN_USE_PARTIAL_VECTORS_P.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-over-widen-10.c: Update test expectations to
avoid spurious matching of scan-tree-dump-not pattern.
* gcc.dg/vect/vect-over-widen-13.c: As above.
* gcc.dg/vect/vect-over-widen-14.c: As above.
* gcc.dg/vect/vect-over-widen-17.c: As above.
* gcc.dg/vect/vect-over-widen-18.c: As above.
* gcc.dg/vect/vect-over-widen-5.c: As above.
* gcc.dg/vect/vect-over-widen-6.c: As above.
* gcc.dg/vect/vect-over-widen-7.c: As above.
* gcc.dg/vect/vect-over-widen-8.c: As above.
* gcc.dg/vect/vect-over-widen-9.c: As above.
---
.../gcc.dg/vect/vect-over-widen-10.c | 2 +-
.../gcc.dg/vect/vect-over-widen-13.c | 2 +-
.../gcc.dg/vect/vect-over-widen-14.c | 2 +-
.../gcc.dg/vect/vect-over-widen-17.c | 2 +-
.../gcc.dg/vect/vect-over-widen-18.c | 2 +-
gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c | 2 +-
gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c | 2 +-
gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c | 2 +-
gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c | 2 +-
gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c | 2 +-
gcc/tree-vect-slp.cc | 93 ++++++--
gcc/tree-vect-stmts.cc | 201 +++++++++++++-----
gcc/tree-vectorizer.h | 13 +-
13 files changed, 234 insertions(+), 93 deletions(-)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c
index f0140e4ef6d..6efcf739db9 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c
@@ -16,5 +16,5 @@
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern:
detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern:
detected:[^\n]* >> 2} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern:
detected:[^\n]* \(unsigned char\)} "vect" } } */
-/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int vect__} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
index 08a65ea5518..720353716cf 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
@@ -48,5 +48,5 @@ main (void)
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern:
detected:[^\n]* \+} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern:
detected:[^\n]* / 2} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern:
detected:[^\n]* = \(signed char\)} "vect" } } */
-/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int vect__} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c
index dfa09f5d2ca..f1d5f95c543 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c
@@ -15,5 +15,5 @@
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern:
detected:[^\n]* \+} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern:
detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern:
detected:[^\n]* = \(unsigned char\)} "vect" } } */
-/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int vect__} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
index 53fcfd0c06c..ac1a0f86727 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
@@ -46,5 +46,5 @@ main (void)
adopts realign_load scheme. It requires rs6000_builtin_mask_for_load to
generate mask whose return type is vector char. */
/* { dg-final { scan-tree-dump-not {vector[^\n]*char} "vect" { target
vect_hw_misalign } } } */
-/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int vect__} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
index aa58cd1c957..3ebfaa78270 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
@@ -47,5 +47,5 @@ main (void)
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern:
detected:[^\n]* |} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern:
detected:[^\n]* <<} "vect" } } */
/* { dg-final { scan-tree-dump {vector[^\n]*char} "vect" } } */
-/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int vect__} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
index c2ab11a9d32..1d89789a86d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
@@ -49,5 +49,5 @@ main (void)
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern:
detected:[^\n]* \+ } "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern:
detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern:
detected:[^\n]* \(signed char\)} "vect" } } */
-/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int vect__} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c
index bda92c965e0..62d5a52587e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c
@@ -13,5 +13,5 @@
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern:
detected:[^\n]* \+ } "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern:
detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern:
detected:[^\n]* \(unsigned char\)} "vect" } } */
-/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int vect__} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
index 1d55e13fb1f..6e09631009a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
@@ -51,5 +51,5 @@ main (void)
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern:
detected:[^\n]* \+ } "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern:
detected:[^\n]* >> 2} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern:
detected:[^\n]* \(signed char\)} "vect" } } */
-/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int vect__} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c
index 553c0712a79..b6d650beab4 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c
@@ -16,5 +16,5 @@
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern:
detected:[^\n]* \+ } "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern:
detected:[^\n]* >> 2} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern:
detected:[^\n]* \(unsigned char\)} "vect" } } */
-/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int vect__} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
index 36bfc68e053..e82f8a571da 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
@@ -56,5 +56,5 @@ main (void)
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern:
detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern:
detected:[^\n]* >> 2} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern:
detected:[^\n]* \(signed char\)} "vect" } } */
-/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^ ]* int vect__} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 4bb731a6658..270af4dfab6 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1073,8 +1073,12 @@ vect_record_nunits (vec_info *vinfo, stmt_vec_info
stmt_info,
}
/* If populating the vector type requires unrolling then fail
- before adjusting *nunits for basic-block vectorization. */
+ before adjusting *nunits for basic-block vectorization.
+ Allow group sizes that are indivisible by the vector length only if they
+ are known not to exceed the vector length. We may be able to support such
+ cases by generating constant masks. */
if (is_a <bb_vec_info> (vinfo)
+ && maybe_gt (group_size, TYPE_VECTOR_SUBPARTS (vectype))
&& !multiple_p (group_size, TYPE_VECTOR_SUBPARTS (vectype)))
{
if (dump_enabled_p ())
@@ -1126,12 +1130,29 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char
*swap,
tree soft_fail_nunits_vectype = NULL_TREE;
tree vectype, nunits_vectype;
+ bool unsupported_datatype = false;
if (!vect_get_vector_types_for_stmt (vinfo, first_stmt_info, &vectype,
- &nunits_vectype, group_size))
+ &nunits_vectype, &unsupported_datatype,
+ group_size))
{
- /* Fatal mismatch. */
- matches[0] = false;
- return false;
+ /* Try to get fallback vector types and continue analysis, producing
+ matches[] as if vectype was not an issue. This allows splitting of
+ groups to happen. */
+ if (unsupported_datatype
+ && vect_get_vector_types_for_stmt (vinfo, first_stmt_info, &vectype,
+ &nunits_vectype,
+ &unsupported_datatype))
+ {
+ gcc_assert (is_a<bb_vec_info> (vinfo));
+ maybe_soft_fail = true;
+ soft_fail_nunits_vectype = nunits_vectype;
+ }
+ else
+ {
+ /* Fatal mismatch. */
+ matches[0] = false;
+ return false;
+ }
}
if (is_a <bb_vec_info> (vinfo)
&& known_le (TYPE_VECTOR_SUBPARTS (vectype), 1U))
@@ -1659,16 +1680,22 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char
*swap,
if (maybe_soft_fail)
{
- unsigned HOST_WIDE_INT const_nunits;
- if (!TYPE_VECTOR_SUBPARTS
- (soft_fail_nunits_vectype).is_constant (&const_nunits)
- || const_nunits > group_size)
+ /* Use the known minimum number of subparts for VLA because we still need
+ to choose a splitting point although the choice is more arbitrary. */
+ unsigned HOST_WIDE_INT const_nunits = constant_lower_bound (
+ TYPE_VECTOR_SUBPARTS (soft_fail_nunits_vectype));
+
+ if (const_nunits > group_size)
matches[0] = false;
else
{
/* With constant vector elements simulate a mismatch at the
point we need to split. */
+ gcc_assert ((const_nunits % 2) == 0);
unsigned tail = group_size & (const_nunits - 1);
+ if (tail == 0)
+ tail = const_nunits;
+ gcc_assert (group_size >= tail);
memset (&matches[group_size - tail], 0, sizeof (bool) * tail);
}
return false;
@@ -2398,13 +2425,21 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
/* Check whether we can build the invariant. If we can't
we never will be able to. */
tree type = TREE_TYPE (chains[0][n].op);
- if (!GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
- && (TREE_CODE (type) == BOOLEAN_TYPE
- || !can_duplicate_and_interleave_p (vinfo, group_size,
- type)))
+ if (!GET_MODE_SIZE (vinfo->vector_mode).is_constant ())
{
- matches[0] = false;
- goto out;
+ if (TREE_CODE (type) == BOOLEAN_TYPE)
+ {
+ matches[0] = false;
+ goto out;
+ }
+
+ if (!is_a<bb_vec_info> (vinfo)
+ && !can_duplicate_and_interleave_p (vinfo, group_size,
+ type))
+ {
+ matches[0] = false;
+ goto out;
+ }
}
}
else if (dt != vect_internal_def)
@@ -2833,7 +2868,7 @@ out:
uniform_val = NULL_TREE;
break;
}
- if (!uniform_val
+ if (!uniform_val && !is_a<bb_vec_info> (vinfo)
&& !can_duplicate_and_interleave_p (vinfo,
oprnd_info->ops.length (),
TREE_TYPE (op0)))
@@ -4976,7 +5011,8 @@ vect_analyze_slp_instance (vec_info *vinfo,
= calculate_unrolling_factor (nunits, group_size);
if (maybe_ne (unrolling_factor, 1U)
- && is_a<bb_vec_info> (vinfo))
+ && is_a<bb_vec_info> (vinfo)
+ && !known_ge (nunits.min, group_size))
{
unsigned HOST_WIDE_INT const_max_nunits;
if (!nunits.max.is_constant (&const_max_nunits)
@@ -5063,9 +5099,10 @@ vect_analyze_slp_instance (vec_info *vinfo,
= TREE_TYPE (DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type,
1 << floor_log2 (i));
- unsigned HOST_WIDE_INT const_nunits;
- if (vectype
- && TYPE_VECTOR_SUBPARTS (vectype).is_constant (&const_nunits))
+ unsigned HOST_WIDE_INT const_nunits
+ = vectype ? constant_lower_bound (TYPE_VECTOR_SUBPARTS (vectype))
+ : 0;
+ if (const_nunits > 1 && (i % const_nunits) == 0)
{
/* Split into two groups at the first vector boundary. */
gcc_assert ((const_nunits & (const_nunits - 1)) == 0);
@@ -11688,7 +11725,21 @@ vectorizable_slp_permutation_1 (vec_info *vinfo,
gimple_stmt_iterator *gsi,
unpack_factor = 1;
}
unsigned olanes = unpack_factor * ncopies * SLP_TREE_LANES (node);
- gcc_assert (repeating_p || multiple_p (olanes, nunits));
+
+ /* With fully-predicated BB-SLP, an external node's number of lanes can be
+ incompatible with the chosen vector width (e.g., lane packs of 3 with a
+ natural 2-lane vector type). */
+ if (!repeating_p && !multiple_p (olanes, nunits))
+ {
+ if (dump_p)
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "unsupported permutation %p: vector type %T,"
+ " nunits=" HOST_WIDE_INT_PRINT_UNSIGNED
+ " ncopies=%" PRIu64 ", lanes=%u and unpack=%u\n",
+ (void *) node, vectype, estimated_poly_value (nunits),
+ ncopies, SLP_TREE_LANES (node), unpack_factor);
+ return -1;
+ }
/* Compute the { { SLP operand, vector index}, lane } permutation sequence
from the { SLP operand, scalar lane } permutation as recorded in the
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 0b4a081a211..4be1d641897 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1703,23 +1703,27 @@ check_load_store_for_partial_vectors (vec_info *vinfo,
tree vectype,
unsigned int nvectors;
if (can_div_away_from_zero_p (size, nunits, &nvectors))
return nvectors;
- gcc_unreachable ();
+
+ gcc_assert (known_le (size, nunits));
+ return 1u;
};
poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
- poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+ poly_uint64 size = loop_vinfo
+ ? group_size * LOOP_VINFO_VECT_FACTOR (loop_vinfo)
+ : SLP_TREE_LANES (slp_node);
unsigned factor;
vect_partial_vector_style partial_vector_style
= vect_get_partial_vector_style (vectype, is_load, &factor, elsvals);
if (partial_vector_style == vect_partial_vectors_len)
{
- nvectors = group_memory_nvectors (group_size * vf, nunits);
+ nvectors = group_memory_nvectors (size, nunits);
vect_record_len (vinfo, slp_node, nvectors, vectype, factor);
}
else if (partial_vector_style == vect_partial_vectors_while_ult)
{
- nvectors = group_memory_nvectors (group_size * vf, nunits);
+ nvectors = group_memory_nvectors (size, nunits);
vect_record_mask (vinfo, slp_node, nvectors, vectype, scalar_mask);
}
else
@@ -3382,12 +3386,11 @@ vect_get_strided_load_store_ops (stmt_vec_info
stmt_info, slp_tree node,
static tree
vect_get_loop_variant_data_ptr_increment (
- vec_info *vinfo, tree aggr_type, gimple_stmt_iterator *gsi,
+ loop_vec_info loop_vinfo, tree aggr_type, gimple_stmt_iterator *gsi,
vec_loop_lens *loop_lens, dr_vec_info *dr_info,
vect_memory_access_type memory_access_type)
{
- loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
- tree step = vect_dr_behavior (vinfo, dr_info)->step;
+ tree step = vect_dr_behavior (loop_vinfo, dr_info)->step;
/* gather/scatter never reach here. */
gcc_assert (!mat_gather_scatter_p (memory_access_type));
@@ -3431,7 +3434,7 @@ vect_get_data_ptr_increment (vec_info *vinfo,
gimple_stmt_iterator *gsi,
loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
if (loop_vinfo && LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
- return vect_get_loop_variant_data_ptr_increment (vinfo, aggr_type, gsi,
+ return vect_get_loop_variant_data_ptr_increment (loop_vinfo, aggr_type,
gsi,
loop_lens, dr_info,
memory_access_type);
@@ -5265,7 +5268,7 @@ vect_create_vectorized_demotion_stmts (vec_info *vinfo,
vec<tree> *vec_oprnds,
call the function recursively. */
static void
-vect_create_vectorized_promotion_stmts (vec_info *vinfo,
+vect_create_vectorized_promotion_stmts (vec_info *vinfo, slp_tree slp_node,
vec<tree> *vec_oprnds0,
vec<tree> *vec_oprnds1,
stmt_vec_info stmt_info, tree vec_dest,
@@ -5278,37 +5281,39 @@ vect_create_vectorized_promotion_stmts (vec_info *vinfo,
gimple *new_stmt1, *new_stmt2;
vec<tree> vec_tmp = vNULL;
- vec_tmp.create (vec_oprnds0->length () * 2);
+ const unsigned ncopies = vect_get_num_copies (vinfo, slp_node);
+ vec_tmp.create (ncopies);
+ gcc_assert (vec_oprnds0->length () <= ncopies);
FOR_EACH_VEC_ELT (*vec_oprnds0, i, vop0)
{
+ if (vec_tmp.length () >= ncopies)
+ break;
+
if (op_type == binary_op)
vop1 = (*vec_oprnds1)[i];
else
vop1 = NULL_TREE;
/* Generate the two halves of promotion operation. */
- new_stmt1 = vect_gen_widened_results_half (vinfo, ch1, vop0, vop1,
- op_type, vec_dest, gsi,
- stmt_info);
- new_stmt2 = vect_gen_widened_results_half (vinfo, ch2, vop0, vop1,
- op_type, vec_dest, gsi,
- stmt_info);
- if (is_gimple_call (new_stmt1))
- {
- new_tmp1 = gimple_call_lhs (new_stmt1);
- new_tmp2 = gimple_call_lhs (new_stmt2);
- }
- else
+ new_stmt1
+ = vect_gen_widened_results_half (vinfo, ch1, vop0, vop1, op_type,
+ vec_dest, gsi, stmt_info);
+ new_tmp1 = is_gimple_call (new_stmt1) ? gimple_call_lhs (new_stmt1)
+ : gimple_assign_lhs (new_stmt1);
+ vec_tmp.quick_push (new_tmp1);
+
+ if (vec_tmp.length () < ncopies)
{
- new_tmp1 = gimple_assign_lhs (new_stmt1);
- new_tmp2 = gimple_assign_lhs (new_stmt2);
+ new_stmt2
+ = vect_gen_widened_results_half (vinfo, ch2, vop0, vop1, op_type,
+ vec_dest, gsi, stmt_info);
+ new_tmp2 = is_gimple_call (new_stmt2) ? gimple_call_lhs (new_stmt2)
+ : gimple_assign_lhs (new_stmt2);
+ vec_tmp.quick_push (new_tmp2);
}
-
- /* Store the results for the next step. */
- vec_tmp.quick_push (new_tmp1);
- vec_tmp.quick_push (new_tmp2);
}
+ gcc_assert (vec_tmp.length () <= ncopies);
vec_oprnds0->release ();
*vec_oprnds0 = vec_tmp;
}
@@ -5520,6 +5525,7 @@ vectorizable_conversion (vec_info *vinfo,
from the scalar type. */
if (!vectype_in)
vectype_in = get_vectype_for_scalar_type (vinfo, rhs_type, slp_node);
+
if (!cost_vec)
gcc_assert (vectype_in);
if (!vectype_in)
@@ -5910,12 +5916,15 @@ vectorizable_conversion (vec_info *vinfo,
stmt_info, this_dest, gsi, c1,
op_type);
else
- vect_create_vectorized_promotion_stmts (vinfo, &vec_oprnds0,
- &vec_oprnds1, stmt_info,
- this_dest, gsi,
+ vect_create_vectorized_promotion_stmts (vinfo, slp_node,
+ &vec_oprnds0, &vec_oprnds1,
+ stmt_info, this_dest, gsi,
c1, c2, op_type);
}
+ gcc_assert (vec_oprnds0.length ()
+ == vect_get_num_copies (vinfo, slp_node));
+
FOR_EACH_VEC_ELT (vec_oprnds0, i, vop0)
{
gimple *new_stmt;
@@ -5939,6 +5948,16 @@ vectorizable_conversion (vec_info *vinfo,
generate more than one vector stmt - i.e - we need to "unroll"
the vector stmt by a factor VF/nunits. */
vect_get_vec_defs (vinfo, slp_node, op0, &vec_oprnds0);
+
+ /* Promotion no longer produces redundant defs (since support was
+ added for length/mask-predicated BB SLP of awkward-sized groups),
+ therefore demotion now has to handle that case too. */
+ if (vec_oprnds0.length () % 2 != 0)
+ {
+ tree vectype = TREE_TYPE (vec_oprnds0[0]);
+ vec_oprnds0.safe_push (build_zero_cst (vectype));
+ }
+
/* Arguments are ready. Create the new vector stmts. */
if (cvt_type && modifier == NARROW_DST)
FOR_EACH_VEC_ELT (vec_oprnds0, i, vop0)
@@ -10688,7 +10707,7 @@ vectorizable_load (vec_info *vinfo,
aggr_type = build_array_type_nelts (elem_type, group_size * nunits);
if (!costing_p)
- bump = vect_get_data_ptr_increment (vinfo, gsi, dr_info, aggr_type,
+ bump = vect_get_data_ptr_increment (loop_vinfo, gsi, dr_info, aggr_type,
memory_access_type, loop_lens);
unsigned int inside_cost = 0, prologue_cost = 0;
@@ -13227,6 +13246,37 @@ vect_analyze_stmt (vec_info *vinfo,
" live stmt not supported: %G",
stmt_info->stmt);
+ if (bb_vinfo)
+ {
+ unsigned int group_size = SLP_TREE_LANES (node);
+ tree vectype = SLP_TREE_VECTYPE (node);
+ poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+ bool needs_partial = maybe_lt (group_size, nunits);
+ if (needs_partial)
+ {
+ /* If partial vectors are required then they must be supported by the
+ target; however, don't assume that a partial vectors style has
+ been set because a mask or length may not be required for the
+ statement. */
+ if (!SLP_TREE_CAN_USE_PARTIAL_VECTORS_P (node))
+ return opt_result::failure_at (stmt_info->stmt,
+ "not vectorized: SLP node needs but "
+ "cannot use partial vectors: %G",
+ stmt_info->stmt);
+ }
+ else
+ {
+ /* If we don't need partial vectors then we don't care about whether
+ they are supported or not; however, we need to clear any partial
+ vectors style that might have been chosen because it will be used
+ to control generation of lengths or masks. */
+ SLP_TREE_PARTIAL_VECTORS_STYLE (node) = vect_partial_vectors_none;
+ }
+
+ if (maybe_gt (group_size, nunits))
+ gcc_assert (multiple_p (group_size, nunits));
+ }
+
return opt_result::success ();
}
@@ -13529,13 +13579,7 @@ tree
get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type,
unsigned int group_size)
{
- /* For BB vectorization, we should always have a group size once we've
- constructed the SLP tree; the only valid uses of zero GROUP_SIZEs
- are tentative requests during things like early data reference
- analysis and pattern recognition. */
- if (is_a <bb_vec_info> (vinfo))
- gcc_assert (vinfo->slp_instances.is_empty () || group_size != 0);
- else
+ if (!is_a <bb_vec_info> (vinfo))
group_size = 0;
tree vectype = get_related_vectype_for_scalar_type (vinfo->vector_mode,
@@ -13549,10 +13593,18 @@ get_vectype_for_scalar_type (vec_info *vinfo, tree
scalar_type,
vinfo->used_vector_modes.add (TYPE_MODE (vectype));
/* If the natural choice of vector type doesn't satisfy GROUP_SIZE,
- try again with an explicit number of elements. */
- if (vectype
- && group_size
- && maybe_ge (TYPE_VECTOR_SUBPARTS (vectype), group_size))
+ try again with an explicit number of elements. A vector type satisfies
+ GROUP_SIZE if it is definitely not too long to store the whole group,
+ or we are able to generate masks to handle the unknown number of excess
+ lanes that might exist. Otherwise, we must substitute a vector type that
+ can be used to carve up the group.
+ */
+ if (vectype && group_size
+ && maybe_gt (TYPE_VECTOR_SUBPARTS (vectype), group_size)
+ && (vect_get_partial_vector_style (vectype, true)
+ == vect_partial_vectors_none
+ || vect_get_partial_vector_style (vectype, false)
+ == vect_partial_vectors_none))
{
/* Start with the biggest number of units that fits within
GROUP_SIZE and halve it until we find a valid vector type.
@@ -13868,7 +13920,36 @@ vect_maybe_update_slp_op_vectype (vec_info *vinfo,
slp_tree op, tree vectype)
&& SLP_TREE_DEF_TYPE (op) == vect_external_def
&& SLP_TREE_LANES (op) > 1)
return false;
- (void) vinfo; /* FORNOW */
+
+ /* When the vectorizer falls back to building vector operands from scalars,
+ it can create SLP trees with external defs that have a number of lanes not
+ divisible by the number of subparts in a vector type naively inferred from
+ the scalar type. Reject such types to avoid ICE when later computing the
+ prologue cost for invariant operands. */
+ if (SLP_TREE_DEF_TYPE (op) == vect_external_def)
+ {
+ poly_uint64 vf = 1;
+
+ if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
+ vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+
+ vf *= SLP_TREE_LANES (op);
+
+ if (maybe_lt (TYPE_VECTOR_SUBPARTS (vectype), vf)
+ && !multiple_p (vf, TYPE_VECTOR_SUBPARTS (vectype)))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "lanes=" HOST_WIDE_INT_PRINT_UNSIGNED
+ " is not divisible by "
+ "subparts=" HOST_WIDE_INT_PRINT_UNSIGNED ".\n",
+ estimated_poly_value (vf),
+ estimated_poly_value (
+ TYPE_VECTOR_SUBPARTS (vectype)));
+ return false;
+ }
+ }
+
SLP_TREE_VECTYPE (op) = vectype;
return true;
}
@@ -14590,27 +14671,32 @@ vect_gen_while_not (gimple_seq *seq, tree mask_type,
tree start_index,
- Set *NUNITS_VECTYPE_OUT to the vector type that contains the maximum
number of units needed to vectorize STMT_INFO, or NULL_TREE if the
- statement does not help to determine the overall number of units. */
+ statement does not help to determine the overall number of units.
+
+ - Set *UNSUPPORTED_DATATYPE to false.
+
+ On failure:
+
+ - Set *UNSUPPORTED_DATATYPE to true if the statement can't be vectorized
+ because it uses a data type that the target doesn't support in vector form
+ for a group of the given GROUP_SIZE.
+ */
opt_result
vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
tree *stmt_vectype_out,
tree *nunits_vectype_out,
+ bool *unsupported_datatype,
unsigned int group_size)
{
gimple *stmt = stmt_info->stmt;
- /* For BB vectorization, we should always have a group size once we've
- constructed the SLP tree; the only valid uses of zero GROUP_SIZEs
- are tentative requests during things like early data reference
- analysis and pattern recognition. */
- if (is_a <bb_vec_info> (vinfo))
- gcc_assert (vinfo->slp_instances.is_empty () || group_size != 0);
- else
+ if (!is_a<bb_vec_info> (vinfo))
group_size = 0;
*stmt_vectype_out = NULL_TREE;
*nunits_vectype_out = NULL_TREE;
+ *unsupported_datatype = false;
if (gimple_get_lhs (stmt) == NULL_TREE
/* Allow vector conditionals through here. */
@@ -14683,10 +14769,13 @@ vect_get_vector_types_for_stmt (vec_info *vinfo,
stmt_vec_info stmt_info,
}
vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
if (!vectype)
- return opt_result::failure_at (stmt,
- "not vectorized:"
- " unsupported data-type %T\n",
- scalar_type);
+ {
+ *unsupported_datatype = true;
+ return opt_result::failure_at (stmt,
+ "not vectorized:"
+ " unsupported data-type %T\n",
+ scalar_type);
+ }
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 9f0354093ff..9ca0c79fc49 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2326,6 +2326,8 @@ vect_get_num_copies (vec_info *vinfo, slp_tree node)
vf *= SLP_TREE_LANES (node);
tree vectype = SLP_TREE_VECTYPE (node);
+ if (known_ge (TYPE_VECTOR_SUBPARTS (vectype), vf))
+ return 1;
return vect_get_num_vectors (vf, vectype);
}
@@ -2624,9 +2626,9 @@ extern tree vect_gen_while (gimple_seq *, tree, tree,
tree,
const char * = nullptr);
extern void vect_gen_while_ssa_name (gimple_seq *, tree, tree, tree, tree);
extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
-extern opt_result vect_get_vector_types_for_stmt (vec_info *,
- stmt_vec_info, tree *,
- tree *, unsigned int = 0);
+extern opt_result vect_get_vector_types_for_stmt (vec_info *, stmt_vec_info,
+ tree *, tree *,
+ bool *, unsigned int = 0);
extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info, unsigned int = 0);
/* In tree-if-conv.cc. */
@@ -2959,9 +2961,8 @@ vect_can_use_partial_vectors_p (vec_info *vinfo, slp_tree
slp_node)
loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
if (loop_vinfo)
return LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo);
-
- (void) slp_node; /* FORNOW */
- return false;
+ else
+ return SLP_TREE_CAN_USE_PARTIAL_VECTORS_P (slp_node);
}
/* If VINFO is vectorizer state for loop vectorization then record that we no
--
2.43.0