As mentioned on IRC, enabling optimizations with MODE=release and picking 
clang++ (6.0 here) vs g++ (7.4 here) makes major differences when benchmarking 
exp2f and log2f from glibc against our approximations. On a modern AMD64 
processor, glibc is often faster. Internally it also uses polynomials around 
order 4, but picks its coefficients from a table depending on the input 
argument. With that it achieves errors <  1 ULP and is often speedier because 
it can also use hand crafted SSE2 implementations.
I haven't had a chance to benchmark the approximations on musl, but so far, 
based on your submission, I'm inclined to integrate the following:

1) Rename bse_approx6_exp2 to fast_exp2() and get rid of the other 
approximation variants.
2) Add fast_log2() based on your 6th order version, but with error correction 
for integer logarithms.
3) When building for AMD64, use exp2f to implement fast_exp2 and use log2f to 
implement fast_log2.

Here's the error correction I'm talking about, note that exchanging "long 
double" for "float" makes the code significantly slower, because it forces the 
compiler to add code to reduce precision. On my machine, this version is 
roughly as fast as log2f when compiling with optimizations, with both compilers:

    static inline long double G_GNUC_CONST
    fast_log2f (float value)
    {
      union {
        float f;
        int i;
      } float_u;
      float_u.f = value;
      // compute log_2 using float exponent
      const int log_2 = ((float_u.i >> 23) & 255) - 128;
      // replace float exponent
      float_u.i &= ~(255 << 23);
      float_u.i += BSE_FLOAT_BIAS << 23;
      long double u, x = float_u.f;
      // lolremez --long-double -d 6 -r 1:2 
"log(x)/log(2)+1-0.00000184568668708"
      u =         -2.5691088815846393966e-2l;
      u = u * x +  2.7514877034856806734e-1l;
      u = u * x + -1.2669182593669424748l;
      u = u * x +  3.2865287704176774059l;
      u = u * x + -5.3419892025067624343l;
      u = u * x +  6.1129631283200211528l;
      x = u * x + -2.040042118396715321l;
      return x + log_2;
    }

Error samples, compared to LOG2L(3):

       +0.0, -0.00000231613294631
       +0.5, +0.00000000000000000
       +1.0, +0.00000000000000000
       +1.1, -0.00000181973000285
       +1.5, -0.00000130387210186
       +1.8, -0.00000312228549678
       +2.0, +0.00000000000000000
       +2.2, -0.00000181973000285
       +2.5, -0.00000140048214306
       +3.0, -0.00000130387210186
       +4.0, +0.00000000000000000
       +5.0, -0.00000140048214306
       +6.0, -0.00000130387210186
       +7.0, -0.00000312228549678
       +8.0, +0.00000000000000000
       +9.0, -0.00000084878575295
      +10.0, -0.00000140048214306
      +11.0, -0.00000368176020430
      +16.0, +0.00000000000000000
      +32.0, +0.00000000000000000
      +40.0, -0.00000140048214306
      +48.0, -0.00000130387210186
      +54.0, -0.00000149844406951
      +64.0, +0.00000000000000000
     +127.0, -0.00000162654178981
     +128.0, +0.00000000000000000


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/tim-janik/beast/pull/124#issuecomment-530135495
_______________________________________________
beast mailing list
[email protected]
https://mail.gnome.org/mailman/listinfo/beast

Reply via email to