https://issues.dlang.org/show_bug.cgi?id=13474
Walter Bright <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] --- Comment #22 from Walter Bright <[email protected]> --- This boils down to the following code: double foo(double x, double t, double s, double c) { double y = x - t; c += y + s; return s + c; } The body of which, when optimized, looks like: return s + (c + (x - t) + s); Or, in x87 instructions: fld qword ptr 01Ch[ESP] fld qword ptr 0Ch[ESP] fxch ST(1) fsub qword ptr 014h[ESP] fadd qword ptr 0Ch[ESP] fadd qword ptr 4[ESP] fstp qword ptr 4[ESP] fadd qword ptr 4[ESP] ret 020h The algorithm relies on rounding to double precision of the (x-t) calculation. The only way to get the x87 to do that is to actually assign it to memory. But the compiler optimizes away the assignment to memory, because it is substantially slower. The 64 bit code does not have this problem, because the code gen looks like: push RBP mov RBP,RSP movsd XMM4,XMM0 movsd XMM5,XMM1 subsd XMM3,XMM2 addsd XMM3,XMM5 addsd XMM4,XMM3 movsd XMM0,XMM5 addsd XMM0,XMM4 pop RBP ret It's doing the same optimization, but the result is rounded to double because the XMM registers are doubles. Note that the following targets generate x87 code, not XMM code: Win32, Linux32, FreeBSD32 because it is not guaranteed that the target has XMM registers. I suspect we don't really care about the floating point performance on those targets, but we do care that the code gives expected results. So I propose that the fix is to disable optimizing away the assignment to y for x87 code gen targets. --
