Re: [PATCH][v2] tree-optimization/124068 - fix missed AVX2 vectorization of shift

Richard Sandiford Thu, 19 Feb 2026 06:27:42 -0800

Richard Biener <[email protected]> writes:
> The following fixes a regression in AVX2 vectorization because on
> trunk we are now correctly determine we can shorten a shift operation
> but we never really bothered to check we can implement the
> resulting operation.  With the patch we now check this.  For shifts
> and rotates we have the choice between vector-vector and vector-scalar
> operations which in the end depends on whether we perform SLP or not
> and how the shift operand matches up.  The patch heuristically
> assumes that constant or external shifts can be handled by vector-scalar
> operations.
>
> As we were not checking for target support was to allow recursive matching
> other patterns, the following still errors on that side in case the
> original operation was not supported by the target.  This helps avoiding
> regressions in gcc.dg/vect/vect-over-widen-13.c and
> gcc.dg/vect/vect-div-bitmask-1.c where the operation in question is
> integer division.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu and aarech64-linux.
>
> v2 adds re-enables the pattern when the original operation wasn't
> supported, this avoids the divmod related regressions observed in v1.
> I've also merged the old comment with the new.


I still think the approach described in the existing comment makes sense:
use patterns to canonicalise, then use other patterns to fit to the target.
The second part would cope with narrow shifts that start out as unsupported
and need to be widened.

But I realise that's somewhat at odds with other patterns, which are
geared towards what the target supports.  And when it comes to the
long-term direction, it would be better to do this stuff on the SLP
graph, which means that adding an extra pattern to the existing code
would be a dead end.  So I don't mind the patch.

One comment below.

> OK for trunk?
>
> Thanks,
> Richard.
>
>       PR tree-optimization/124068
>       * tree-vect-patterns.cc (target_has_vecop_for_code): Move
>       earlier, add defaulted optab_subtype parameter.
>       (vect_recog_over_widening_pattern): Check that the target
>       supports the narrowed operation before committing to the
>       pattern.
>
>       * gcc.target/i386/vect-shift-1.c: New testcase.
> ---
>  gcc/testsuite/gcc.target/i386/vect-shift-1.c | 10 ++++
>  gcc/tree-vect-patterns.cc                    | 50 +++++++++++++-------
>  2 files changed, 44 insertions(+), 16 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-shift-1.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/vect-shift-1.c 
> b/gcc/testsuite/gcc.target/i386/vect-shift-1.c
> new file mode 100644
> index 00000000000..eb31ef48ba0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-shift-1.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target avx2 } } */
> +/* { dg-options "-O2 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */
> +
> +void f (short* acc)
> +{
> +  for (unsigned char row = 0; row < 16; ++row)
> +    acc[row] = acc[row] << row;
> +}
> +
> +/* { dg-final { scan-tree-dump "optimized: loop vectorized" "vect" } } */
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 97130206a21..30e1ab76727 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -1036,6 +1036,17 @@ vect_reassociating_reduction_p (vec_info *vinfo,
>    return true;
>  }
>  
> +/* Return true iff the target has a vector optab implementing the operation
> +   CODE on type VECTYPE with SUBTYPE.  */
> +
> +static bool
> +target_has_vecop_for_code (tree_code code, tree vectype,
> +                        enum optab_subtype subtype = optab_vector)
> +{
> +  optab voptab = optab_for_tree_code (code, vectype, subtype);
> +  return voptab && can_implement_p (voptab, TYPE_MODE (vectype));
> +}
> +
>  /* match.pd function to match
>     (cond (cmp@3 a b) (convert@1 c) (convert@2 d))
>     with conditions:
> @@ -3160,16 +3171,34 @@ vect_recog_over_widening_pattern (vec_info *vinfo,
>        && (code == PLUS_EXPR || code == MINUS_EXPR || code == MULT_EXPR))
>      op_type = build_nonstandard_integer_type (new_precision, true);
>  
> -  /* We specifically don't check here whether the target supports the
> -     new operation, since it might be something that a later pattern
> -     wants to rewrite anyway.  If targets have a minimum element size
> -     for some optabs, we should pattern-match smaller ops to larger ops
> -     where beneficial.  */
>    tree new_vectype = get_vectype_for_scalar_type (vinfo, new_type);
>    tree op_vectype = get_vectype_for_scalar_type (vinfo, op_type);
>    if (!new_vectype || !op_vectype)
>      return NULL;
>  
> +  /* Verify we can handle the new operation.  For shifts and rotates
> +     apply heuristic of whether we are likely facing vector-vector or
> +     vector-scalar operation.  Since we are eventually expecting that
> +     a later pattern might eventually want to rewrite an unsupported
> +     into a supported case error on that side in case the original
> +     operation was not supported either.  */
> +  if (code == RSHIFT_EXPR || code == LSHIFT_EXPR || code == RROTATE_EXPR)
> +    {
> +      if (!target_has_vecop_for_code (code, op_vectype, optab_vector)
> +       && ((unprom[1].dt != vect_external_def
> +            && unprom[1].dt != vect_constant_def)
> +           || !target_has_vecop_for_code (code, op_vectype, optab_scalar))
> +       && !(!target_has_vecop_for_code (code, *type_out, optab_vector)
> +            && ((unprom[1].dt != vect_external_def
> +                 || unprom[1].dt != vect_constant_def)
> +                || !target_has_vecop_for_code (code, *type_out,
> +                                               optab_scalar))))
> +     return NULL;

Maybe the repeated per-type test is complex enough that it'd be worth
splitting out into a function or lambda.  All the double negatives
in the second part give me a headache :)

LGTM with or without that change.

Thanks,
Richard

> +    }
> +  else if (!target_has_vecop_for_code (code, op_vectype, optab_vector)
> +        && target_has_vecop_for_code (code, *type_out, optab_vector))
> +    return NULL;
> +
>    if (dump_enabled_p ())
>      dump_printf_loc (MSG_NOTE, vect_location, "demoting %T to %T\n",
>                    type, new_type);
> @@ -4151,17 +4180,6 @@ vect_recog_vector_vector_shift_pattern (vec_info 
> *vinfo,
>    return pattern_stmt;
>  }
>  
> -/* Return true iff the target has a vector optab implementing the operation
> -   CODE on type VECTYPE.  */
> -
> -static bool
> -target_has_vecop_for_code (tree_code code, tree vectype)
> -{
> -  optab voptab = optab_for_tree_code (code, vectype, optab_vector);
> -  return voptab
> -      && can_implement_p (voptab, TYPE_MODE (vectype));
> -}
> -
>  /* Verify that the target has optabs of VECTYPE to perform all the steps
>     needed by the multiplication-by-immediate synthesis algorithm described by
>     ALG and VAR.  If SYNTH_SHIFT_P is true ensure that vector addition is

Re: [PATCH][v2] tree-optimization/124068 - fix missed AVX2 vectorization of shift

Reply via email to