On Jun 18, 2007, at 2:14 PM, Uros Bizjak wrote:
tbp wrote:
For example, when doing 1/x and sqrt(x) via reciprocal + NR, you
first
get an inf from said reciprocal which then turns to a NaN in the NR
stage but if you correct it by, say, doing a comparison to 0 and a
'and'.
That's what ICC used to do in your back. That's what you'll find page
151 of the amdfam10 optimization manual. Because that's a common
case.
As far as i can see, there's no such provision in the current patch.
At the very least provide a mean to look after those NaNs without
losing sanity, like a way to enforce argument order of
min/max[ss|ps|pd] without ressorting to inline asm.
But even if sqrt is corrected for 0.0 * inf, there would still be a
lot of problems with the combinations of NR-enhanced rsqrt and rcp.
Consider for example:
1.0/sqrt(a/b) alias rsqrt(a/b)
Having a=0, b != 0, the result is inf.
As already stated, -ffast-math turns on -ffinite-math-only, which
allows the compiler to assume that a result of inf cannot happen, so
gcc is allowed to ignore this possiblity. Producing NaN instead of
inf seems to be allowed.
This expression is mathematically equal to sqrt(b/a) and the
compiler is free to do this optimization. In this case, b*rcp(a)
produces NaN due to NR of rcp(a) and here we loose.
Let's correct both, rsqrt and rcp NR steps for 0.0, so we have NR-
rsqrt(0.0) = inf, NR-rcp(0.0) = inf.
Again, sqrt(b/a) will create sqrt(inf) = inf * rsqrt(inf), so NR
step for rsqrt will hit (0.0 * inf) from the other side. We loose,
because there is no correction for the case where input operand is
infinity.
IMO, due to limited range of operands for -mrecip pass (inf, -
inf); where 0.0 is excluded, it should be keept out of -ffast-math.
There is no point to fix reciprocals only for 0.0, we need to fix
both conversions for infinity and 0.0, even in -ffast-math.
I think that tbp wants just to ensure that sqrt(0.0)=0.0 even with
your various reciprocal and sqrt optimizations. (I can't test the
new code now, but I think he claims that with the new sqrt
optimizations sqrt(0.) => NaN; if indeed it does this then I would
consider this a bug.) I don't think he wants the optimizations to
have to "do the right thing" when an argument or result of one of
these operations is infinite or a NaN.
Of course, he can correct me if I'm wrong.
Brad