tra added a comment.

I don't think using FMA throws away IEEE compliance.

IEEE 784-2008 says:

> A language standard should also define, and require implementations to 
> provide, attributes that allow and

>  disallow value-changing optimizations, separately or collectively, for a 
> block. These optimizations might

>  include, but are not limited to:

>  ...

>  ― Synthesis of a fusedMultiplyAdd operation from a multiplication and an 
> addition


It sounds like FMA use is up to user/language and IEEE standard is fine with it 
either way.

We need to establish what is the language standard that we need to adhere to. 
C++ standard itself does not seem to say much about FP precision or particular 
FP format.

C11 standard (ISO/IEC 9899:201x draft, 7.12.2) says:

> The default state (‘‘on’’ or ‘‘off’’) for the [FP_CONTRACT] pragma is 
> implementation-defined.


Nvidia has fairly detailed description of their FP.
http://docs.nvidia.com/cuda/floating-point/index.html#fused-multiply-add-fma

> The fused multiply-add operator on the GPU has high performance and increases 
> the accuracy of computations. **No special flags or function calls are needed 
> to gain this benefit in CUDA programs**. Understand that a hardware fused 
> multiply-add operation is not yet available on the CPU, which can cause 
> differences in numerical results.


At the moment it's the most specific guideline I managed to find regarding 
expected FP behavior applicable to CUDA.


http://reviews.llvm.org/D20341



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to