On Wed, 27 Aug 2025, Jakub Jelinek wrote: > On Wed, Aug 27, 2025 at 02:48:33PM +0200, Richard Biener wrote: > > I don't understand how synth-mult works, but it does introduce > > multiple uses of a reduction variable which will ultimatively > > fail vectorization (or ICE with a pending change). So avoid > > applying the pattern. I've tried to do so selectively, possibly > > preserving pattern-matching x * 4 as x << 2. > > > > So basically a single replacement stmt should be OK, likewise sth like > > > > tem = -x; > > tem = tem << 3; > > res = -tem; > > > > so a single use of 'x' remains. Even using x + x for x * 2 when > > x << 1 isn't possible does not work. So if only pow2 mults will > > work I can probably make a condition on that. Will we synthesize > > any more complex appropriate chain? > > > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > > > Any comments? > > > > Thanks, > > Richard. > > > > * tree-vect-patterns.cc (vect_synth_mult_by_constant): Avoid > > in cases that introduce multiple uses of reduction operands. > > --- > > gcc/tree-vect-patterns.cc | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > > index 3fffcac4b3a..fae4b393dff 100644 > > --- a/gcc/tree-vect-patterns.cc > > +++ b/gcc/tree-vect-patterns.cc > > @@ -4303,6 +4303,10 @@ vect_synth_mult_by_constant (vec_info *vinfo, tree > > op, tree val, > > /* Targets that don't support vector shifts but support vector additions > > can synthesize shifts that way. */ > > bool synth_shift_p = !vect_supportable_shift (vinfo, LSHIFT_EXPR, > > multtype); > > + if (synth_shift_p > > + /* Any multiple use of the reduction operand will break it. */ > > + && vect_is_reduction (stmt_vinfo)) > > + return NULL; > > > > HOST_WIDE_INT hwval = tree_to_shwi (val); > > /* Use MAX_COST here as we don't want to limit the sequence on rtx costs. > > @@ -4333,7 +4337,12 @@ vect_synth_mult_by_constant (vec_info *vinfo, tree > > op, tree val, > > if (alg.op[0] == alg_zero) > > accumulator = build_int_cst (multtype, 0); > > else > > - accumulator = op; > > + { > > + /* Any multiple use of the reduction operand will break it. */ > > + if (vect_is_reduction (stmt_vinfo)) > > + return NULL; > > + accumulator = op; > > + } > > > > bool needs_fixup = (variant == negate_variant) > > || (variant == add_variant); > > That seems to be too conservative to me. > Wouldn't it be better instead of both chunks do something like: > if (vect_is_reduction (stmt_vinfo)) > { > int op_uses = alg.op[0] != alg_zero; > for (int i = 1; i < alg.ops; i++) > switch (alg.op[i]) > { > case alg_add_t_m2: > case alg_sub_t_m2: > if (synth_shift_p && alg.log[i]) > return NULL; > else > op_uses++; > break; > case alg_add_t2_m: > case alg_sub_t2_m: > op_uses++; > break; > default: > break; > } > if (variant == add_variant) > op_uses++; > if (op_uses > 1) > return NULL; > } > This basically counts how many times the later code will use > the op operand (with synth_shift_p just returning NULL immediately > because it is for alg.log[i] more than 1).
Ah, thanks for the write-up how to compute the number of uses. I'll test the following. Richard. >From 12f7ce0b2a7b12eeadffee6049d733d46ef484dd Mon Sep 17 00:00:00 2001 From: Richard Biener <rguent...@suse.de> Date: Wed, 27 Aug 2025 14:40:37 +0200 Subject: [PATCH] Avoid mult pattern if that will break reduction constraints To: gcc-patches@gcc.gnu.org synth-mult introduces multiple uses of a reduction variable in some cases which will ultimatively fail vectorization (or ICE with a pending change). So avoid applying the pattern in such case. * tree-vect-patterns.cc (vect_synth_mult_by_constant): Avoid in cases that introduce multiple uses of reduction operands. Co-authored-by: Jakub Jelinek <ja...@redhat.com> --- gcc/tree-vect-patterns.cc | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 3fffcac4b3a..d5c81be6e9d 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -4303,6 +4303,8 @@ vect_synth_mult_by_constant (vec_info *vinfo, tree op, tree val, /* Targets that don't support vector shifts but support vector additions can synthesize shifts that way. */ bool synth_shift_p = !vect_supportable_shift (vinfo, LSHIFT_EXPR, multtype); + if (synth_shift_p) + return NULL; HOST_WIDE_INT hwval = tree_to_shwi (val); /* Use MAX_COST here as we don't want to limit the sequence on rtx costs. @@ -4314,6 +4316,35 @@ vect_synth_mult_by_constant (vec_info *vinfo, tree op, tree val, if (!possible) return NULL; + if (vect_is_reduction (stmt_vinfo)) + { + int op_uses = alg.op[0] != alg_zero; + for (int i = 1; i < alg.ops; i++) + switch (alg.op[i]) + { + case alg_add_t_m2: + case alg_sub_t_m2: + if (synth_shift_p && alg.log[i]) + return NULL; + else + op_uses++; + break; + case alg_add_t2_m: + case alg_sub_t2_m: + op_uses++; + break; + default: + break; + } + if (variant == add_variant) + op_uses++; + /* When we'll synthesize more than a single use of the reduction + operand the reduction constraints are violated. Avoid this + situation. */ + if (op_uses > 1) + return NULL; + } + if (!target_supports_mult_synth_alg (&alg, variant, vectype, synth_shift_p)) return NULL; -- 2.43.0