On Wed, 21 Aug 2024, Robin Dapp wrote:

> > > > >   _Bool iftmp.0_113;
> > > > >   _Bool iftmp.0_114;
> > > > >   iftmp.0_113 = .MASK_LOAD (_170, 8B, _169, _171(D));
> > > > >   iftmp.0_114 = _47 | iftmp.0_113;
> 
> > >   _BoolD.2746 _47;
> > >   iftmp.0_114 = _47 ? 1 : iftmp.0_113;
> > > which is folded into
> > >   iftmp.0_114 = _47 | iftmp.0_113;
> 
> >
> > _47 was the .MASK_LOAD def, right?
> 
> _47 is the inverted load mask, iftmp.0_113 is the MASK_LOAD result.
> Its mask is _169 where _169 = ~_47;
> 
> > It's not exactly obvious what goes wrong - the transform above
> > is correct - it's only "unexpected" for the lanes that were
> > masked.  So the actual bug must be downstream from iftmp.0_144.
> >
> > I think one can try to reason on the ifcvt (scalar) code by
> > assuming the .MASK_LOAD def would be undefined.  Then we'd
> > have _47(D) ? 1 : iftmp.0_133 -> _47(D) | iftmp.0_133, I think
> > that's at most phishy as the COND_EXPR has a well-defined
> > value while the IOR might spill "undefined" elsewhere causing
> > divergence.  Is that what is actually happening?
> 
> After vectorization we recognize the mask (_47) as degenerate,
> i.e. all ones and, conversely, the masked load mask (_169) is all zeros.
> So we shouldn't really load anything.
> 
> Optimized we have
> 
>   vect_patt_384.36_436 = .MASK_LEN_GATHER_LOAD (_435, vect__47.35_432, 1, { 
> 0, ... }, { 0, ... }, _471, 0);
>   vect_iftmp.37_439 = vect_patt_384.36_436 | { 1, ... };
> 
> We then re-use a non-zero vector register as masked load result.  Its
> stale values cause the wrong result (which should be 1 everywhere).

And we fail to fold vect_patt_384.36_436 | { 1, ... } to { 1, ... }?
Or is the issue that vector masks contain padding and with
non-zero masking we'd have garbage in the padding and that leaks
here?  That is, _47 ? 1 : iftmp.0_113 -> _47 | iftmp.0_113 assumes
there's exactly one bit in a bool, specifically it has assumptions
on padding - I'd guess that

 *(char *)p = 17;
 _Bool _47 = *(_Bool *)p;
 ... = _47 ? 1 : b;

would have similar issues but eventually loading 17 into a _Bool
invokes undefined behavior.  So maybe the COND_EXPRs are only
required for .MASK_LOADs of _Bool (or any other type with padding)?

Richard.

Reply via email to