On Fri, 11 Aug 2023, Richard Biener wrote:
> > I think it converts SNan to QNan (when the partial vector has just one > > element which is SNan), so is a test for -fsignaling-nans missing? > > Hm, I guess that's a corner case that could happen when there's no > runtime profitability check on more than one element and when the > element accumulated is directly loaded from memory. OTOH the > loop vectorizer always expects an initial value for the reduction > and thus we perform either no add (when the loop isn't entered) > or at least a single add (when it is). So I think this particular > situation cannot occur? Yes, that makes sense, thanks for the elaboration. (it's a bit subtle so maybe worth a comment? not sure) > > In the defaut -fno-rounding-math -fno-signaling-nans mode I think we > > can do the reduction by substituting negative zero for masked-off > > elements ? maybe it's worth diagnosing that case separately (i.e. > > as "not yet implemented", not an incorrect transform)? > > Ah, that's interesting. So the only case we can't handle is > -frounding-math -fsigned-zeros then. I'll see to adjust the patch > accordingly, like the following incremental patch: Yeah, nice! > > (note that in avx512 it's possible to materialize negative zeroes > > by mask with a single vpternlog instruction, which is cheap) > > It ends up loading the { -0.0, ... } constant from memory, the > { 0.0, ... } mask is handled by using a zero-masked load, so > indeed cheaper. I was thinking it could be easily done without a memory load, but got confused, sorry. Alexander