On Friday, 27 June 2014 at 06:48:44 UTC, Iain Buclaw via
Digitalmars-d wrote:
On 27 June 2014 07:14, Iain Buclaw <ibuc...@gdcproject.org>
wrote:
On 27 June 2014 02:31, David Nadlinger via Digitalmars-d
<digitalmars-d@puremagic.com> wrote:
Hi all,
right now, the use of std.math over core.stdc.math can cause
a huge
performance problem in typical floating point graphics code.
An instance of
this has recently been discussed here in the "Perlin noise
benchmark speed"
thread [1], where even LDC, which already beat DMD by a
factor of two,
generated code more than twice as slow as that by Clang and
GCC. Here, the
use of floor() causes trouble. [2]
Besides the somewhat slow pure D implementations in std.math,
the biggest
problem is the fact that std.math almost exclusively uses
reals in its API.
When working with single- or double-precision floating point
numbers, this
is not only more data to shuffle around than necessary, but
on x86_64
requires the caller to transfer the arguments from the SSE
registers onto
the x87 stack and then convert the result back again.
Needless to say, this
is a serious performance hazard. In fact, this accounts for
an 1.9x slowdown
in the above benchmark with LDC.
Because of this, I propose to add float and double overloads
(at the very
least the double ones) for all of the commonly used functions
in std.math.
This is unlikely to break much code, but:
a) Somebody could rely on the fact that the calls
effectively widen the
calculation to 80 bits on x86 when using type deduction.
b) Additional overloads make e.g. "&floor" ambiguous without
context, of
course.
What do you think?
Cheers,
David
This is the reason why floor is slow, it has an array copy
operation.
---
auto vu = *cast(ushort[real.sizeof/2]*)(&x);
---
I didn't like it at the time I wrote, but at least it
prevented the
compiler (gdc) from removing all bit operations that followed.
If there is an alternative to the above, then I'd imagine that
would
speed up floor by tenfold.
Can you test with this?
https://github.com/D-Programming-Language/phobos/pull/2274
Float and Double implementations of floor/ceil are trivial and
I can add later.
Nice! I tested with the Perlin noise benchmark, and it got
faster(in my environment, 1.030s -> 0.848s).
But floor still consumes almost half of the execution time.