On 8/9/23 07:51, Alexander Monakov wrote:
On Wed, 9 Aug 2023, Richard Biener via Gcc-patches wrote:
The following teaches the non-loop reduction vectorization code to
handle non-associatable reductions. Using the existing FOLD_LEFT_PLUS
internal functions might be possible but I'd have to convince myself
that +0.0 + x[0] is a safe extra operation in ever rounding mode
(I also have no way to test the resulting code).
It's not. Under our default -fno-signaling-nans -fno-rounding-math
negative zero is the neutral element for addition, so '-0.0 + x[0]'
might be (but negative zero costs more to materialize).
If the reduction has at least two elements, then
-0.0 + x[0] + x[1]
has the same behavior w.r.t SNaNs as 'x[0] + x[1]', but unfortunately
yields negative zero when x[0] = x[1] = +0.0 and rounding towards
negative infinity (unlike x[0] + x[1], which is +0.0).
Hmm, then there's a bug in an non-released port I worked on a while
back. It supports FOLD_LEFT_PLUS by starting the sequence with a +0.0
in the destination register.
I guess if that port ever gets upstreamed I'll have to keep an eye out
for that problem. Luckily I think they can synthesize a -0.0 trivially,
potentially even zero cost.
Thanks!
Jeff