On Monday, 16 May 2016 at 08:10:02 UTC, Walter Bright wrote:
IEEE floats do not specify precision of intermediate results. A C/C++ compiler can be fully IEEE compliant and yet legitimately have increased precision for intermediate results.

IEEE 754-2008 provide language designers with features that enables predictable bit-accurate computations using floats for ordinary floating point operations. Only a subset of functions are implementation defined.

This also has the advantage that you can do bit-level optimizations... including proving asserts to hold at compile time and "assume assert" optimizations that you are fond of ;-).

But all the C/C++ compilers I have used support reliable coercion to 32 bit floats.

I posted several links here pointing out this behavior in VC++ and g++. If your C++ numerics code didn't have a problem with it, it's likely you wrote the code in such a way that more accurate answers were not wrong.

I use clang++ only for production, and I don't really care how I wrote my code. What I do know is that in performance optimized code for 32 bit floats and simd I do rely upon unit-testing with guaranteed 32 bit floats. I absolutely do not want unit tests to execute with higher precision. I want it to break if 32 bit floats fails. If I cannot be sure of this I risk having libraries that enter infinite loops in production code.

Keep in mind that even "simple" algorithms can get complex when you write for high performance. E.g. sound processing. So having predictable outcome is very much desirable. Such code also don't gain much from compiler optimizations...


FP behavior has complex trade-offs with speed, accuracy, compatibility, and size. There are no easy, obvious answers.

Well, but randomly increasing precision is always a bad idea when you deal with related computations, like time series. It is better to have consistent noise/bias than random noise/bias. Also, in the case of audio processing it is not unheard of to exploit the 24 bit mantissa and the specifics of the IEEE 32 bit floating point format.

Now, I don't object to having a "real" type that works the way you want. What I object to is having float and double act that way. Or rather, not having strict ieee32 and ieee64 types.

Reply via email to