On Friday, 27 June 2014 at 06:48:44 UTC, Iain Buclaw via Digitalmars-d wrote:
On 27 June 2014 07:14, Iain Buclaw <ibuc...@gdcproject.org> wrote:
On 27 June 2014 02:31, David Nadlinger via Digitalmars-d
<digitalmars-d@puremagic.com> wrote:
Hi all,

right now, the use of std.math over core.stdc.math can cause a huge performance problem in typical floating point graphics code. An instance of this has recently been discussed here in the "Perlin noise benchmark speed" thread [1], where even LDC, which already beat DMD by a factor of two, generated code more than twice as slow as that by Clang and GCC. Here, the
use of floor() causes trouble. [2]

Besides the somewhat slow pure D implementations in std.math, the biggest problem is the fact that std.math almost exclusively uses reals in its API. When working with single- or double-precision floating point numbers, this is not only more data to shuffle around than necessary, but on x86_64 requires the caller to transfer the arguments from the SSE registers onto the x87 stack and then convert the result back again. Needless to say, this is a serious performance hazard. In fact, this accounts for an 1.9x slowdown
in the above benchmark with LDC.

Because of this, I propose to add float and double overloads (at the very least the double ones) for all of the commonly used functions in std.math.
This is unlikely to break much code, but:
a) Somebody could rely on the fact that the calls effectively widen the
calculation to 80 bits on x86 when using type deduction.
b) Additional overloads make e.g. "&floor" ambiguous without context, of
course.

What do you think?

Cheers,
David


This is the reason why floor is slow, it has an array copy operation.

---
  auto vu = *cast(ushort[real.sizeof/2]*)(&x);
---

I didn't like it at the time I wrote, but at least it prevented the
compiler (gdc) from removing all bit operations that followed.

If there is an alternative to the above, then I'd imagine that would
speed up floor by tenfold.


Can you test with this?

https://github.com/D-Programming-Language/phobos/pull/2274

Float and Double implementations of floor/ceil are trivial and I can add later.

Nice! I tested with the Perlin noise benchmark, and it got faster(in my environment, 1.030s -> 0.848s).
But floor still consumes almost half of the execution time.

Reply via email to