On Fri, 10 Nov 2023, Andrew Stubbs wrote:

> On 23/10/2023 11:43, Richard Biener wrote:
> > On Fri, 20 Oct 2023, Andrew Stubbs wrote:
> > 
> >> This patch fixes a wrong-code bug on amdgcn in which the excess "ones" in
> >> the
> >> mask enable extra lanes that were supposed to be unused and are therefore
> >> undefined.
> >>
> >> Richi suggested an alternative approach involving narrower types and then a
> >> zero-extend to the actual mask type.  This solved the problem for the
> >> specific
> >> test case that I had, but I'm not sure if it would work with V2 and V4
> >> modes
> >> (not that I've observed bad behaviour from them anyway, but still).  There
> >> were some other caveats involving "two-lane division" that I don't fully
> >> understand, so I went with the simpler implementation.
> >>
> >> This patch does have the disadvantage of an additional "and" instruction in
> >> the non-constant case even for machines that don't need it. I'm not sure
> >> how
> >> to fix that without an additional target hook. (If GCC could use the
> >> 64-lane
> >> vectors more effectively without the assistance of artificially reduced
> >> sizes
> >> then this problem wouldn't exist.)
> >>
> >> OK to commit?
> > 
> > -           convert_move (target, op0, 0);
> > +           rtx tmp = gen_reg_rtx (mode);
> > +           convert_move (tmp, op0, 0);
> > +
> > +           if (known_ne (TYPE_VECTOR_SUBPARTS (type),
> > +                         GET_MODE_PRECISION (mode)))
> > 
> > Usually this would be maybe_ne, but then ...
> > 
> > +             {
> > +               /* Ensure no excess bits are set.
> > +                  GCN needs this, AVX does not.  */
> > +               expand_binop (mode, and_optab, tmp,
> > +                             GEN_INT ((1 << (TYPE_VECTOR_SUBPARTS (type)
> > +                                             .to_constant())) - 1),
> > +                             target, true, OPTAB_DIRECT);
> > 
> > here you have .to_constant ().  I think with having an integer mode
> > we know subparts is constant so I'd prefer
> > 
> >              auto nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> >              if (maybe_ne (GET_MODE_PRECISION (mode), nunits)
> > ...
> > 
> > +             }
> > +           else
> > +             emit_move_insn (target, tmp);
> > 
> > note you need the emit_move_insn also for the expand_binop
> > path since it's not guaranteed that 'target' is used there.  Thus
> > 
> >    tmp = expand_binop (...)
> >    if (tmp != target)
> >      emit_move_insn (...)
> > 
> > Otherwise looks good to me.  The and is needed on x86 for
> > two and four bit masks, it would be more efficient to use
> > smaller modes for the sign-extension I guess.
> 
> I think this patch addresses these issues. I've confirmed it does the right
> thing on amdgcn.
> 
> OK?

OK.

thanks,
Richard.

> Andrew
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to