On Friday, 24 May 2019 at 08:33:34 UTC, Ola Fosheim Grøstad wrote:
On Thursday, 23 May 2019 at 21:47:45 UTC, Alex wrote:
Either way, sin it's still twice as fast. Also, in the code
the sinTab version is missing the writeln so it would have
been faster.. so it is not being optimized out.
Well, when I run this modified version:
https://gist.github.com/run-dlang/9f29a83b7b6754da98993063029ef93c
on https://run.dlang.io/
then I get:
LUT: 709
sin(x): 2761
So the LUT is 3-4 times faster even with your quarter-LUT
overhead.
FWIW, as far as I can tell I managed to get the lookup version
down to 104 by using bit manipulation tricks like these:
auto fastQuarterLookup(double x){
const ulong mantissa = cast(ulong)( (x - floor(x)) *
(cast(double)(1UL<<63)*2.0) );
const double sign =
cast(double)(-cast(uint)((mantissa>>63)&1));
… etc
So it seems like a quarter-wave LUT is 27 times faster than sin…
You just have to make sure that the generated instructions fills
the entire CPU pipeline.