On Thursday, 4 August 2016 at 20:58:57 UTC, Walter Bright wrote:
On 8/4/2016 1:29 PM, Fool wrote:
I'm afraid, I don't understand your implementation. Isn't
toFloat(x) +
toFloat(y) computed in real precision (first rounding)? Why
doesn't
toFloat(toFloat(x) + toFloat(y)) involve another rounding?
You're right, in that case, it does. But C does, too:
http://www.exploringbinary.com/double-rounding-errors-in-floating-point-conversions/
Yes. It seems, however, as if Rick Regan is not advocating this
behavior.
This is important to remember when advocating for "C-like"
floating point - because C simply does not behave as most
programmers seem to assume it does.
That's right. "C-like" might be what they say but what they want
is double precision computations to be carried out in double
precision.
What toFloat() does is guarantee that its argument is rounded
to float.
The best way to approach this when designing fp algorithms is
to not require them to have reduced precision.
I understand your point of view. However, there are (probably
rare) situations where one requires more control. I think that
simulating double-double precision arithmetic using Veltkamp
split was mentioned as a resonable example, earlier.
It's also important to realize that on some machines, the
hardware does not actually support float precision operations,
or may do so at a large runtime penalty (x87).
That's another story.