Re: Easiest way to use FMA instruction

Johan via Digitalmars-d-learn Thu, 09 Jan 2020 16:05:43 -0800

On Thursday, 9 January 2020 at 22:50:37 UTC, Ben Jones wrote:

On Thursday, 9 January 2020 at 20:57:10 UTC, Ben Jones wrote:
What's the easiest way to use the FMA instruction (fusedmultiply add that has nice rounding properties)? The FMAfunction in Phobos just does a*b +c which will round twice.
Do any of the intrinsics libraries include this? Should Iwrite my own inline ASM?



Why do you want to use the FMA instruction?

If for performance:

Inline assembly is generally very bad for performance as itdisables inlining and the compiler probably does not understandthe instruction itself (hence cannot combine it with otheroptimizations). In this case you don't necessarily need the FMAinstruction (instead you want whatever instruction is fastest),so you shouldn't force the compiler to use that instruction. Havea look at https://github.com/AuburnSounds/intel-intrinsics, FMAis not supported yet.



If only for the rounding behavior:

Then indeed you need to force the compiler to use the FMAinstruction (also for non-optimized code, so cannot rely onoptimizer). Inline assembly is a solution. GDC and LDC provide abetter inline assembly method that preserves a.o. inliningpotential and doesn't require hardcoded ABI details.

For LDC:
```
double fma(double a, double b, double c)
{
    import ldc.llvmasm;
    return __irEx!(

`declare double @llvm.fma.f64(double %a, double %b,double %c)`,`%r = call double @llvm.fma.f64(double %0, double%1, double %2)

              ret double %r`,
             "",
             double, double, double, double)(a,b,c);
}
```

https://wiki.dlang.org/LDC_inline_IR , but it is a littleoutdated, see https://github.com/ldc-developers/ldc/issues/3271



cheers,
 Johan

Re: Easiest way to use FMA instruction

Reply via email to