Richard Biener wrote:
> On Fri, Aug 4, 2017 at 2:26 PM, Alexander Monakov <amona...@ispras.ru> wrote:
> > On Fri, 4 Aug 2017, Wilco Dijkstra wrote:
> >> This patch simplifies pow (C, x) into exp (x * C1), where C1 = log (C).
> >
> > I don't think you can do that for non-positive C.

True, that can be easily disallowed.

> Hmm, the question is also how this interacts with other folders like
> sqrt (pow (x, y)) -> pow (|x|, y * 0,5)?  Also we seem to miss

We fold sqrt (pow (C, x)) into pow (C, x * 0.5) first, then fold that to exp.

> pow (2, x) -> exp2 (x) and pow (10, x) -> pow10/exp10, those may
> be a better fit than exp (log (2/10) * x)?  OTOH for fast-math
> canonicalization getting rid of exp2/10 and pow10 might be beneficial.

exp10 is non-standard and doesn't have a first-class implementation in GLIBC.
Although pow (10, x) is frequently used in Fortran, I can't get exp10 emitted
by match.pd...

>> Do this only for fast-math as accuracy is reduced.  This is much faster
>> since pow is more complex than exp - with a current GLIBC the speedup
>> is more than 7 times for this transformation.
>
> Is it bound to be so on future glibc revisions and non-glibc platforms?

Yes, pow is basically log followed by exp, so exp will always be cheaper than
pow. How much will obviously vary depending on the implementation.
Szabolc's highly optimized expf has 3x throughput of the optimized powf.

> And how is accuracy affected?  I think the transform is only reasonable
> for log (C) being close to e, 2 or 10 (using exp, exp2 or exp10).  Can you
> provide an idea on whether there's a systematic error (with glibc) and
> how that behaves over the parameter space?

Accuracy depends again on the library implementation. If log (C) is accurate
(ie. far less than 0.5 ULP), and exp (x) accurate to 0.5 ULP then you get 
perfect answers over the full range.

The exp function has the largest steps close to inf - a 1 ULP change in input
changes the output by 1024 ULP (128 ULP for expf). So a 0.5ULP input error
would give ~512 ULP error if the final result is close to inf. In practice the 
output
doesn't get anywhere near inf, so ULP errors are far smaller.

> Oh, and what value of C does the benchmark that triggered this have?

10 appears quite common. I extracted a runtime log of all powf calls in SPEC
(see https://sourceware.org/ml/libc-alpha/2017-06/msg00718.html) and noticed
a lot of repetition in some inputs. Further investigation showed many uses of
pow have a constant first operand, so an obvious target for optimization.

Wilco

Reply via email to