On Thursday, 4 August 2016 at 20:58:57 UTC, Walter Bright wrote:
On 8/4/2016 1:29 PM, Fool wrote:
I'm afraid, I don't understand your implementation. Isn't toFloat(x) + toFloat(y) computed in real precision (first rounding)? Why doesn't
toFloat(toFloat(x) + toFloat(y)) involve another rounding?

You're right, in that case, it does. But C does, too:

http://www.exploringbinary.com/double-rounding-errors-in-floating-point-conversions/

Yes. It seems, however, as if Rick Regan is not advocating this behavior.


This is important to remember when advocating for "C-like" floating point - because C simply does not behave as most programmers seem to assume it does.

That's right. "C-like" might be what they say but what they want is double precision computations to be carried out in double precision.


What toFloat() does is guarantee that its argument is rounded to float.

The best way to approach this when designing fp algorithms is to not require them to have reduced precision.

I understand your point of view. However, there are (probably rare) situations where one requires more control. I think that simulating double-double precision arithmetic using Veltkamp split was mentioned as a resonable example, earlier.


It's also important to realize that on some machines, the hardware does not actually support float precision operations, or may do so at a large runtime penalty (x87).

That's another story.

Reply via email to