------- Comment #2 from dominiq at lps dot ens dot fr  2008-01-08 09:46 -------
I don't think either that this is a regression, only a bad side effect. A
possibility to overcome it would be to change the way theNewton-Raphson
iteration is computed. Presently it seems to be x1=x0*(2.0-x*x0) which is bad
when x*x0=nearest(1.0,-1.0) as the result of (2.0-x*x0) is 1.0. I see two ways
to improve the accuracy: x1=2.0*x0-(x*x0*x0) and x1=x0+(x0*(1.0-x*x0))
(assuming the parentheses are obeyed).
The first case add a multiply, but should not increase the latency if the
multiply in 2.0*x0 is inserted between the first and the second multiplies of
x*x0*x0. The second case would add the 'add' latency to the original one, but
have a better balance between adds and multiplies and is probably the most
accurate.

Since I am not familiar enough with the x86, I cannot guess precisely  what are
the other effects of these implementations: extra moves, register pressure, ...
. Naively I'll say that the first one would be better in codes having a deficit
in multiplies while the second one would be better for long sequence of
divisions.

If anyone is interested to dig further this issue, I can test patches and
timings on a core2duo. Anyway I think something should be said about this
"feature" in the manual and there may be some need to have some (better?) "cost
model" of replacing a division by "recip+NR" as I read it in a previous post.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34702

Reply via email to