Inserting T=float casts makes the function perform better (at least here). It
avoids conversions between single precision and double precision values (i.e.
cvtsd2ss) which would otherwise be used. So this version
```
static inline float G_GNUC_CONST
fast_log2ff (float value)
{
union {
float f;
int i;
} float_u;
float_u.f = value;
// compute log_2 using float exponent
const int log_2 = ((float_u.i >> 23) & 255) - 128;
// replace float exponent
float_u.i &= ~(255 << 23);
float_u.i += BSE_FLOAT_BIAS << 23;
typedef float T;
T u, x = float_u.f;
// lolremez --long-double -d 6 -r 1:2 "log(x)/log(2)+1-0.00000184568668708"
u = T (-2.5691088815846393966e-2l);
u = u * x + T (2.7514877034856806734e-1l);
u = u * x + T (-1.2669182593669424748l);
u = u * x + T (3.2865287704176774059l);
u = u * x + T (-5.3419892025067624343l);
u = u * x + T (6.1129631283200211528l);
x = u * x + T (-2.040042118396715321l);
return x + log_2;
}
```
is faster, because all operations are on floats. This costs a bit of precision
but the float version (`fast_log2ff`) is faster than using double
(`fast_log2fd`) or long double (`fast_log2fl`).
```
$ g++ -std=c++17 -Wall -g -O3 -o l2 l2.cc `pkg-config --cflags --libs
spectmorph glib-2.0 bse`
$ l2
log2f: 5.369997 ns/call
fast_log2fl: 7.890487 ns/call
fast_log2fd: 4.662395 ns/call
fast_log2ff: 3.652096 ns/call
prec: fast_log2ff: 4.493532e-06
prec: fast_log2fd: 3.721012e-06
prec: fast_log2fl: 3.691373e-06
$ clang++ -std=c++17 -g -O3 -o l2 l2.cc `pkg-config --cflags --libs spectmorph
glib-2.0 bse`
$ l2
log2f: 5.323792 ns/call
fast_log2fl: 7.597113 ns/call
fast_log2fd: 5.201006 ns/call
fast_log2ff: 4.071403 ns/call
prec: fast_log2ff: 4.493532e-06
prec: fast_log2fd: 3.721012e-06
prec: fast_log2fl: 3.691373e-06
```
On the other stuff I mostly agree. If you have use cases in mind (for key
tracking or filter frequency modulation it doesn't matter) that need integers k
exp2 (k) to be 2^k and you think you want to pay for it with one add-mul, ok. I
think relative error is the most important goal here, though. For instance if
the key tracking algorithm returns 222 instead of 220, from a muscians point it
is as bad as returning 888 instead of 880. Both sound equally wrong, and both
have the same relative error (not absolute error).
Applying corrections for fast_log2 (2^k) to yield k for integer k sounds ok to
me. Note that it doesn't fix fast_log2 (7.999999) to be 3, as you patched only
the case where the input is equal to or slightly greater than 2^k, not the case
where it is slightly smaller.
```
fast_log2fl (7.999999) = 2.999996; log2f (7.999999) = 3.000000
fast_log2fl (8.000000) = 3.000000; log2f (8.000000) = 3.000000
fast_log2fl (8.000001) = 3.000000; log2f (8.000001) = 3.000000
```
This could be fixed by adjusting the linear coeffcient of the remez polynomial,
but this would make our worst case error larger, and I think as the result is
so close to the perfect value it is probably not worth it.
As for whether to approximate at all on AMD64: my impression from the
benchmarks is that in many cases using one of the approximations would yield
sufficient quality faster that exp2f or log2f. On AMD64 especially when using
T=float internally.
However, the gain is not dramatic, and maybe we're trying to optimize something
with approximations that is not really a performance problem. For instance the
LadderFilter (the place where this started) typically only needs one log2 value
per note-on. Only portamento would affect this negatively which we do not
support at the moment. What I'm trying to say here is: if we use log2f/exp2f
and one day we run perf on beast and see than 10% of the CPU usage is spent in
exp2f, we could still deal with it at that point in time.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/tim-janik/beast/pull/124#issuecomment-530310880
_______________________________________________
beast mailing list
[email protected]
https://mail.gnome.org/mailman/listinfo/beast