Richard Biener <[email protected]> writes:
> The following fixes a regression in AVX2 vectorization because on
> trunk we are now correctly determine we can shorten a shift operation
> but we never really bothered to check we can implement the
> resulting operation. With the patch we now check this. For shifts
> and rotates we have the choice between vector-vector and vector-scalar
> operations which in the end depends on whether we perform SLP or not
> and how the shift operand matches up. The patch heuristically
> assumes that constant or external shifts can be handled by vector-scalar
> operations.
>
> ISTR not checking for target support was to allow recursive matching
> other patterns, but as can be seen here this is problematic.
My memory wasn't as good, but yeah, seems like so:
/* We specifically don't check here whether the target supports the
new operation, since it might be something that a later pattern
wants to rewrite anyway. If targets have a minimum element size
for some optabs, we should pattern-match smaller ops to larger ops
where beneficial. */
I suppose that comment should be removed if the patch goes in.
> Bootstrapping/testing on x86_64-unknown-linux-gnu and aarch64-linux
> reveals
>
> +FAIL: gcc.dg/vect/vect-over-widen-13.c -flto -ffat-lto-objects
> scan-tree-dump-
> not vect "vector[^ ]* int"
>
> on both and
>
> +FAIL: gcc.dg/vect/vect-div-bitmask-1.c -flto -ffat-lto-objects
> scan-tree-dump-not vect "vect_recog_divmod_pattern: detected"
>
> on aarch64 (might be old baseline).
>
> So how was this supposed to play out? Do we need to handle
> pattern composition in other ways?
Based on the above, it sounds like the idea was that we would have
a late pattern match that canonicalises narrow operations to wider
ones, if the target requires that. That would include operations that
started out narrow, not just ones that were narrowed by this code.
Thanks,
Richard
>
> Thanks,
> Richard.
>
> PR tree-optimization/124068
> * tree-vect-patterns.cc (target_has_vecop_for_code): Move
> earlier, add defaulted optab_subtype parameter.
> (vect_recog_over_widening_pattern): Check that the target
> supports the narrowed operation before committing to the
> pattern.
>
> * gcc.target/i386/vect-shift-1.c: New testcase.
> ---
> gcc/testsuite/gcc.target/i386/vect-shift-1.c | 10 ++++++
> gcc/tree-vect-patterns.cc | 36 ++++++++++++++------
> 2 files changed, 35 insertions(+), 11 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/i386/vect-shift-1.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/vect-shift-1.c
> b/gcc/testsuite/gcc.target/i386/vect-shift-1.c
> new file mode 100644
> index 00000000000..eb31ef48ba0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-shift-1.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target avx2 } } */
> +/* { dg-options "-O2 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */
> +
> +void f (short* acc)
> +{
> + for (unsigned char row = 0; row < 16; ++row)
> + acc[row] = acc[row] << row;
> +}
> +
> +/* { dg-final { scan-tree-dump "optimized: loop vectorized" "vect" } } */
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 11004043aca..b8a0af323ba 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -1036,6 +1036,17 @@ vect_reassociating_reduction_p (vec_info *vinfo,
> return true;
> }
>
> +/* Return true iff the target has a vector optab implementing the operation
> + CODE on type VECTYPE with SUBTYPE. */
> +
> +static bool
> +target_has_vecop_for_code (tree_code code, tree vectype,
> + enum optab_subtype subtype = optab_vector)
> +{
> + optab voptab = optab_for_tree_code (code, vectype, subtype);
> + return voptab && can_implement_p (voptab, TYPE_MODE (vectype));
> +}
> +
> /* match.pd function to match
> (cond (cmp@3 a b) (convert@1 c) (convert@2 d))
> with conditions:
> @@ -3170,6 +3181,20 @@ vect_recog_over_widening_pattern (vec_info *vinfo,
> if (!new_vectype || !op_vectype)
> return NULL;
>
> + /* Verify we can handle the new operation. For shifts and rotates
> + apply heuristic of whether we are likely facing vector-vector or
> + vector-scalar operation. */
> + if (code == RSHIFT_EXPR || code == LSHIFT_EXPR || code == RROTATE_EXPR)
> + {
> + if (!target_has_vecop_for_code (code, op_vectype, optab_vector)
> + && ((unprom[1].dt != vect_external_def
> + && unprom[1].dt != vect_constant_def)
> + || !target_has_vecop_for_code (code, op_vectype, optab_scalar)))
> + return NULL;
> + }
> + else if (!target_has_vecop_for_code (code, op_vectype, optab_vector))
> + return NULL;
> +
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location, "demoting %T to %T\n",
> type, new_type);
> @@ -4151,17 +4176,6 @@ vect_recog_vector_vector_shift_pattern (vec_info
> *vinfo,
> return pattern_stmt;
> }
>
> -/* Return true iff the target has a vector optab implementing the operation
> - CODE on type VECTYPE. */
> -
> -static bool
> -target_has_vecop_for_code (tree_code code, tree vectype)
> -{
> - optab voptab = optab_for_tree_code (code, vectype, optab_vector);
> - return voptab
> - && can_implement_p (voptab, TYPE_MODE (vectype));
> -}
> -
> /* Verify that the target has optabs of VECTYPE to perform all the steps
> needed by the multiplication-by-immediate synthesis algorithm described by
> ALG and VAR. If SYNTH_SHIFT_P is true ensure that vector addition is