On Tue, 5 Jul 2016, Kyrill Tkachov wrote: > > On 01/07/16 13:02, Richard Biener wrote: > > On Thu, 30 Jun 2016, Kyrill Tkachov wrote: > > > > > On 28/06/16 08:54, Richard Biener wrote: > > > > On Thu, 16 Jun 2016, Kyrill Tkachov wrote: > > > > > > > > > On 15/06/16 22:53, Marc Glisse wrote: > > > > > > On Wed, 15 Jun 2016, Kyrill Tkachov wrote: > > > > > > > > > > > > > This is a respin of > > > > > > > https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00952.html following > > > > > > > feedback. > > > > > > > I've changed the code to cast the operand to an unsigned type > > > > > > > before > > > > > > > applying the multiplication algorithm > > > > > > > and cast it back to the signed type at the end. > > > > > > > Whether to perform the cast is now determined by the function > > > > > > > cast_mult_synth_to_unsigned in which I've implemented > > > > > > > the cases that Marc mentioned in [1]. Please do let me know > > > > > > > if there are any other cases that need to be handled. > > > > > > Ah, I never meant those cases as an exhaustive list, I was just > > > > > > looking > > > > > > for > > > > > > examples showing that the transformation was unsafe, and those 2 > > > > > > came to > > > > > > mind: > > > > > > > > > > > > - x*15 -> x*16-x the second one can overflow even when the first one > > > > > > doesn't. > > > > > > > > > > > > - x*-2 -> -(x*2) can overflow when the result is INT_MIN (maybe > > > > > > that's > > > > > > redundant with the negate_variant check?) > > > > > > > > > > > > On the other hand, as long as we remain in the 'positive' > > > > > > operations, > > > > > > turning x*3 to x<<1+x seems perfectly safe. And even x*30 to > > > > > > (x*16-x)*2 > > > > > > cannot cause spurious overflows. But I didn't look at the algorithm > > > > > > closely > > > > > > enough to characterize the safe cases. Now if you have done it, > > > > > > that's > > > > > > good > > > > > > :-) Otherwise, we might want to err on the side of caution. > > > > > > > > > > > I'll be honest, I didn't give it much thought beyond convincing myself > > > > > that > > > > > the two cases you listed are legitimate. > > > > > Looking at expand_mult_const in expmed.c can be helpful (where it > > > > > updates > > > > > val_so_far for checking purposes) to see > > > > > the different algorithm cases. I think the only steps that could cause > > > > > overflow are alg_sub_t_m2, alg_sub_t2_m and alg_sub_factor or when the > > > > > final > > > > > step is negate_variant, which are what you listed (and covered in this > > > > > patch). > > > > > > > > > > richi is away on PTO for the time being though, so we have some time > > > > > to > > > > > convince ourselves :) > > > > ;) I think the easiest way would be to always use unsigned arithmetic. > > > > > > > > While VRP doesn't do anything on vector operations we still have some > > > > match.pd patterns that rely on correct overflow behavior and those > > > > may be enabled for vector types as well. > > > That's fine with me. > > > Here's the patch that performs the casts to unsigned and back when the > > > input > > > type doesn't wrap on overflow. > > > > > > Bootstrapped and tested on arm, aarch64, x86_64. > > > > > > How's this? > > +static bool > > +target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var, > > + tree scaltype) > > +{ > > ... > > + tree vectype = get_vectype_for_scalar_type (scaltype); > > + > > + if (!vectype) > > + return false; > > + > > + /* All synthesis algorithms require shifts, so bail out early if > > + target cannot vectorize them. > > + TODO: If the target doesn't support vector shifts but supports > > vector > > + addition we could synthesize shifts that way. > > vect_synth_mult_by_constant > > + would need to be updated to do that. */ > > + if (!vect_supportable_shift (LSHIFT_EXPR, scaltype)) > > + return false; > > > > I think these tests should be done in the caller before calling > > synth_mult (and vectype be passed down accordingly). Also I wonder > > if synth_mult for a * 2 produces a << 1 or a + a - the case of > > a * 2 -> a + a was what Marcs patch handled. Guarding off LSHIFT_EXPR > > support that early will make that fail on targets w/o vector shift. > > I believe it depends on the relative rtx costs (which of course are not > relevant > at vector gimple level). Anyway, I've moved the check outside. > > I've also added code to synthesise the shifts by additions when vector shift > is not available (the new function is synth_lshift_by_additions). > > > + bool supports_vminus = target_has_vecop_for_code (MINUS_EXPR, vectype); > > + bool supports_vplus = target_has_vecop_for_code (PLUS_EXPR, vectype); > > + > > + if (var == negate_variant > > + && !target_has_vecop_for_code (NEGATE_EXPR, vectype)) > > + return false; > > > > I think we don't have any targets that support PLUS but not MINUS > > or targets that do not support NEGATE. OTOH double-checking doesn't > > matter. > > > > +apply_binop_and_append_stmt (tree_code code, tree op1, tree op2, > > + stmt_vec_info stmt_vinfo) > > +{ > > + if (TREE_INT_CST_LOW (op2) == 0 > > > > integer_zerop (op2) > > Ok > > > + && (code == LSHIFT_EXPR > > + || code == PLUS_EXPR)) > > > > + tree itype = TREE_TYPE (op2); > > > > I think it's dangerous to use the type of op2 given you synthesize > > shifts as well. Why not use the type of op1? > > Yeah, we should be taking the type of op1. Fixed. > > > + /* TODO: Targets that don't support vector shifts could synthesize > > + them using vector additions. target_supports_mult_synth_alg > > would > > + need to be updated to allow this. */ > > + switch (alg.op[i]) > > + { > > > > so I suppose one could at least special-case << 1 and always use > > PLUS for that. > > As said above, I added code to synthesize all constant shifts using additions > if the target doesn't support vector shifts. Note that on targets that do > support > vector shifts we will still generate a shift. I think that's fair enough but > if you > really want to special-case this to always generate an addition I suppose I > can do that. > > > Otherwise looks ok to me. There is a PR with a testcase for > > the x * 2 issue so please add that if it is fixed or amend the fix if not > > ;) > > Thanks for the review. I've added the * 2 and * 4 case. > As for testing I've bootstrapped and tested the patch on aarch64 and x86_64 > with synth_shift_p in vect_synth_mult_by_constant hacked to be always true to > exercise the paths that synthesize the shift by additions. Marc, could you > test this > on the sparc targets you care about?
Looks good. Thanks, Richard. > Thanks, > Kyrill > > 2016-07-05 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > PR target/65951 > * tree-vect-patterns.c: Include mult-synthesis.h. > (target_supports_mult_synth_alg): New function. > (synth_lshift_by_additions): Likewise. > (apply_binop_and_append_stmt): Likewise. > (vect_synth_mult_by_constant): Likewise. > (target_has_vecop_for_code): Likewise. > (vect_recog_mult_pattern): Use above functions to synthesize vector > multiplication by integer constants. > > 2016-07-05 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > PR target/65951 > * gcc.dg/vect/vect-mult-const-pattern-1.c: New test. > * gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise. > * gcc.dg/vect/pr65951.c: Likewise. > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)