tra added a comment. I don't think using FMA throws away IEEE compliance.
IEEE 784-2008 says: > A language standard should also define, and require implementations to > provide, attributes that allow and > disallow value-changing optimizations, separately or collectively, for a > block. These optimizations might > include, but are not limited to: > ... > ― Synthesis of a fusedMultiplyAdd operation from a multiplication and an > addition It sounds like FMA use is up to user/language and IEEE standard is fine with it either way. We need to establish what is the language standard that we need to adhere to. C++ standard itself does not seem to say much about FP precision or particular FP format. C11 standard (ISO/IEC 9899:201x draft, 7.12.2) says: > The default state (‘‘on’’ or ‘‘off’’) for the [FP_CONTRACT] pragma is > implementation-defined. Nvidia has fairly detailed description of their FP. http://docs.nvidia.com/cuda/floating-point/index.html#fused-multiply-add-fma > The fused multiply-add operator on the GPU has high performance and increases > the accuracy of computations. **No special flags or function calls are needed > to gain this benefit in CUDA programs**. Understand that a hardware fused > multiply-add operation is not yet available on the CPU, which can cause > differences in numerical results. At the moment it's the most specific guideline I managed to find regarding expected FP behavior applicable to CUDA. http://reviews.llvm.org/D20341 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits