On Wed, 21 Aug 2024, Robin Dapp wrote: > > > > > _Bool iftmp.0_113; > > > > > _Bool iftmp.0_114; > > > > > iftmp.0_113 = .MASK_LOAD (_170, 8B, _169, _171(D)); > > > > > iftmp.0_114 = _47 | iftmp.0_113; > > > > _BoolD.2746 _47; > > > iftmp.0_114 = _47 ? 1 : iftmp.0_113; > > > which is folded into > > > iftmp.0_114 = _47 | iftmp.0_113; > > > > > _47 was the .MASK_LOAD def, right? > > _47 is the inverted load mask, iftmp.0_113 is the MASK_LOAD result. > Its mask is _169 where _169 = ~_47; > > > It's not exactly obvious what goes wrong - the transform above > > is correct - it's only "unexpected" for the lanes that were > > masked. So the actual bug must be downstream from iftmp.0_144. > > > > I think one can try to reason on the ifcvt (scalar) code by > > assuming the .MASK_LOAD def would be undefined. Then we'd > > have _47(D) ? 1 : iftmp.0_133 -> _47(D) | iftmp.0_133, I think > > that's at most phishy as the COND_EXPR has a well-defined > > value while the IOR might spill "undefined" elsewhere causing > > divergence. Is that what is actually happening? > > After vectorization we recognize the mask (_47) as degenerate, > i.e. all ones and, conversely, the masked load mask (_169) is all zeros. > So we shouldn't really load anything. > > Optimized we have > > vect_patt_384.36_436 = .MASK_LEN_GATHER_LOAD (_435, vect__47.35_432, 1, { > 0, ... }, { 0, ... }, _471, 0); > vect_iftmp.37_439 = vect_patt_384.36_436 | { 1, ... }; > > We then re-use a non-zero vector register as masked load result. Its > stale values cause the wrong result (which should be 1 everywhere).
And we fail to fold vect_patt_384.36_436 | { 1, ... } to { 1, ... }? Or is the issue that vector masks contain padding and with non-zero masking we'd have garbage in the padding and that leaks here? That is, _47 ? 1 : iftmp.0_113 -> _47 | iftmp.0_113 assumes there's exactly one bit in a bool, specifically it has assumptions on padding - I'd guess that *(char *)p = 17; _Bool _47 = *(_Bool *)p; ... = _47 ? 1 : b; would have similar issues but eventually loading 17 into a _Bool invokes undefined behavior. So maybe the COND_EXPRs are only required for .MASK_LOADs of _Bool (or any other type with padding)? Richard.