On Fri, 10 Nov 2023, Andrew Stubbs wrote: > On 23/10/2023 11:43, Richard Biener wrote: > > On Fri, 20 Oct 2023, Andrew Stubbs wrote: > > > >> This patch fixes a wrong-code bug on amdgcn in which the excess "ones" in > >> the > >> mask enable extra lanes that were supposed to be unused and are therefore > >> undefined. > >> > >> Richi suggested an alternative approach involving narrower types and then a > >> zero-extend to the actual mask type. This solved the problem for the > >> specific > >> test case that I had, but I'm not sure if it would work with V2 and V4 > >> modes > >> (not that I've observed bad behaviour from them anyway, but still). There > >> were some other caveats involving "two-lane division" that I don't fully > >> understand, so I went with the simpler implementation. > >> > >> This patch does have the disadvantage of an additional "and" instruction in > >> the non-constant case even for machines that don't need it. I'm not sure > >> how > >> to fix that without an additional target hook. (If GCC could use the > >> 64-lane > >> vectors more effectively without the assistance of artificially reduced > >> sizes > >> then this problem wouldn't exist.) > >> > >> OK to commit? > > > > - convert_move (target, op0, 0); > > + rtx tmp = gen_reg_rtx (mode); > > + convert_move (tmp, op0, 0); > > + > > + if (known_ne (TYPE_VECTOR_SUBPARTS (type), > > + GET_MODE_PRECISION (mode))) > > > > Usually this would be maybe_ne, but then ... > > > > + { > > + /* Ensure no excess bits are set. > > + GCN needs this, AVX does not. */ > > + expand_binop (mode, and_optab, tmp, > > + GEN_INT ((1 << (TYPE_VECTOR_SUBPARTS (type) > > + .to_constant())) - 1), > > + target, true, OPTAB_DIRECT); > > > > here you have .to_constant (). I think with having an integer mode > > we know subparts is constant so I'd prefer > > > > auto nunits = TYPE_VECTOR_SUBPARTS (type).to_constant (); > > if (maybe_ne (GET_MODE_PRECISION (mode), nunits) > > ... > > > > + } > > + else > > + emit_move_insn (target, tmp); > > > > note you need the emit_move_insn also for the expand_binop > > path since it's not guaranteed that 'target' is used there. Thus > > > > tmp = expand_binop (...) > > if (tmp != target) > > emit_move_insn (...) > > > > Otherwise looks good to me. The and is needed on x86 for > > two and four bit masks, it would be more efficient to use > > smaller modes for the sign-extension I guess. > > I think this patch addresses these issues. I've confirmed it does the right > thing on amdgcn. > > OK?
OK. thanks, Richard. > Andrew > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)