On 27 June 2014 07:48, Iain Buclaw <ibuc...@gdcproject.org> wrote: > On 27 June 2014 07:14, Iain Buclaw <ibuc...@gdcproject.org> wrote: >> On 27 June 2014 02:31, David Nadlinger via Digitalmars-d >> <digitalmars-d@puremagic.com> wrote: >>> Hi all, >>> >>> right now, the use of std.math over core.stdc.math can cause a huge >>> performance problem in typical floating point graphics code. An instance of >>> this has recently been discussed here in the "Perlin noise benchmark speed" >>> thread [1], where even LDC, which already beat DMD by a factor of two, >>> generated code more than twice as slow as that by Clang and GCC. Here, the >>> use of floor() causes trouble. [2] >>> >>> Besides the somewhat slow pure D implementations in std.math, the biggest >>> problem is the fact that std.math almost exclusively uses reals in its API. >>> When working with single- or double-precision floating point numbers, this >>> is not only more data to shuffle around than necessary, but on x86_64 >>> requires the caller to transfer the arguments from the SSE registers onto >>> the x87 stack and then convert the result back again. Needless to say, this >>> is a serious performance hazard. In fact, this accounts for an 1.9x slowdown >>> in the above benchmark with LDC. >>> >>> Because of this, I propose to add float and double overloads (at the very >>> least the double ones) for all of the commonly used functions in std.math. >>> This is unlikely to break much code, but: >>> a) Somebody could rely on the fact that the calls effectively widen the >>> calculation to 80 bits on x86 when using type deduction. >>> b) Additional overloads make e.g. "&floor" ambiguous without context, of >>> course. >>> >>> What do you think? >>> >>> Cheers, >>> David >>> >> >> This is the reason why floor is slow, it has an array copy operation. >> >> --- >> auto vu = *cast(ushort[real.sizeof/2]*)(&x); >> --- >> >> I didn't like it at the time I wrote, but at least it prevented the >> compiler (gdc) from removing all bit operations that followed. >> >> If there is an alternative to the above, then I'd imagine that would >> speed up floor by tenfold. >> > > Can you test with this? > > https://github.com/D-Programming-Language/phobos/pull/2274 > > Float and Double implementations of floor/ceil are trivial and I can add > later.
Added float/double implementations.