I don't understand how synth-mult works, but it does introduce multiple uses of a reduction variable which will ultimatively fail vectorization (or ICE with a pending change). So avoid applying the pattern. I've tried to do so selectively, possibly preserving pattern-matching x * 4 as x << 2.
So basically a single replacement stmt should be OK, likewise sth like tem = -x; tem = tem << 3; res = -tem; so a single use of 'x' remains. Even using x + x for x * 2 when x << 1 isn't possible does not work. So if only pow2 mults will work I can probably make a condition on that. Will we synthesize any more complex appropriate chain? Bootstrap and regtest running on x86_64-unknown-linux-gnu. Any comments? Thanks, Richard. * tree-vect-patterns.cc (vect_synth_mult_by_constant): Avoid in cases that introduce multiple uses of reduction operands. --- gcc/tree-vect-patterns.cc | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 3fffcac4b3a..fae4b393dff 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -4303,6 +4303,10 @@ vect_synth_mult_by_constant (vec_info *vinfo, tree op, tree val, /* Targets that don't support vector shifts but support vector additions can synthesize shifts that way. */ bool synth_shift_p = !vect_supportable_shift (vinfo, LSHIFT_EXPR, multtype); + if (synth_shift_p + /* Any multiple use of the reduction operand will break it. */ + && vect_is_reduction (stmt_vinfo)) + return NULL; HOST_WIDE_INT hwval = tree_to_shwi (val); /* Use MAX_COST here as we don't want to limit the sequence on rtx costs. @@ -4333,7 +4337,12 @@ vect_synth_mult_by_constant (vec_info *vinfo, tree op, tree val, if (alg.op[0] == alg_zero) accumulator = build_int_cst (multtype, 0); else - accumulator = op; + { + /* Any multiple use of the reduction operand will break it. */ + if (vect_is_reduction (stmt_vinfo)) + return NULL; + accumulator = op; + } bool needs_fixup = (variant == negate_variant) || (variant == add_variant); -- 2.43.0