There was a recent discussion on Phobos about D's floating point behavior [1]. I think Ilya summarized quite elegantly our problem:

We need to argue with @WalterBright to switch to C like floating pointed arithmetic instead of automatic expansion to reals (this is so horrible that it may kill D for science for ever, @wilzbach can tell a story about Tinflex, I can tell about precise and KBN summations, which does not work correctly with 32-bit DMD). D had a lot of long discussions about math and GC. We are moving to have great D without GC now. Well, I hope we will have D with normal FP arithmetic. But how much years we need to change @WalterBright's opinion on this problem to the common form?

Here's my reply & experience to this, maybe it helps to show the problem. I started to document all the bumps with FP math I ran into on our mir wiki [2]. While some of these are expected, there are some that are horrible, cruel & yield a constant fight against the compiler FP behavior that is different depending on the (1) target, (2) compiler or (3) optimization level is very hard to work with. Wasn't the entire point of D to get it right and avoid weird, unproductive corner-cases across architectures? The problem with Tinflex and a lot of other scientific math is that you need reproducible, predictive behavior. For example the Tinflex algorithm is quite sensitive as it (1) uses pow and exp, so errors sum up quickly and (2) it has an ordered heap of remaining FP values, which means due to FP magic which will be explained below I get totally different resulting values depending on the architecture. Note that the ordering itself is well defined for equality, e.g. the tuples (mymath(1.5), 100), (mymath(1.5), 200) need to result in same ordering. I don't want my program to fail just because I compiled for 32-bit, but maybe code example show more than words.

Consider the following program, it fails on 32-bit :/

```
alias S = double; // same for float

S fun(S)(in S x)
{
    return -1 / x;
}

void main()
{
    S i = fun!S(3);
    assert(i == S(-1) / 3); // this lines passes
    assert(fun!S(3) == S(-1) / 3); // error on 32-bit

// just to be clear, the following variants don't work either on 32-bit
    assert(fun!S(3) == S(-1.0 / 3);
    assert(fun!S(3) == cast(S) S(-1) / 3);
    assert(fun!S(3) == S(S(-1) / 3));
    assert(S(fun!S(3)) == S(-1) / 3);
    assert(cast(S) fun!S(3) == S(-1) / 3);
}
```

Maybe it's easier to see why this behavior is tricky when we look at it in action. For example with this program DMD for x86_64 will yield the same result whereas x86_32 will yield different ones.

```
import std.stdio;

alias S = float;
float shL = 0x1.1b95eep-4; // -0.069235
float shR = 0x1.9c7cfep-7; // -0.012588

F fun(F)(F x)
{
    return 1.0 + x * 2;
}

S f1()
{
    S r = fun(shR);
    S l = fun(shL);
    return (r - l);
}

S f2()
{
    return (fun(shR) - fun(shL));
}

void main
{
    writefln("f1: %a", f1()); //  -0x1.d00cap-4
    writefln("f2: %a", f2());  // on 32-bit: -0x1.d00c9cp-4
    assert(f1 == f2); // fails on 32-bit
}
```

To make matters worse std.math yields different results than compiler/assembly intrinsics - note that in this example import std.math.pow adds about 1K instructions to the output assembler, whereas llvm_powf boils down to the assembly powf. Of course the performance of powf is a lot better, I measured [3] that e.g. std.math.pow takes ~1.5x as long for both LDC and DMD. Of course if you need to run this very often, this cost isn't acceptable.

```
void main()
{
    alias S = float;
    S s1 = 0x1.24c92ep+5;
    S s2 = -0x1.1c71c8p+0;

    import std.math : std_pow = pow;
    import core.stdc.stdio : printf;
    import core.stdc.math: powf;

    printf("std: %a\n", std_pow(s1, s2)); // 0x1.2c155ap-6
    printf("pow: %a\n", s1 ^^ s2); // 0x1.2c155ap-6
    printf("powf: %a\n", powf(s1, s2)); // 0x1.2c1558p-6

    version(LDC)
    {
        import ldc.intrinsics : llvm_pow;
        printf("ldc: %a\n", llvm_pow(s1, s2)); // 0x1.2c1558p-6
    }
}
```

I excluded the discrepancies in FP arithmetics between Windows and Linux/macOS as it's hopefully just a bug [4].

[1] https://github.com/dlang/phobos/pull/3217
[2] https://github.com/libmir/mir/wiki/Floating-point-issues
[3] https://github.com/wilzbach/d-benchmarks
[4] https://issues.dlang.org/show_bug.cgi?id=16344

Reply via email to