> Am 28.07.2021 um 01:50 schrieb Sebastian Berg <sebast...@sipsolutions.net>:
> 
> Hi all,
> 
> there is a proposal to add some Intel specific fast math routine to
> NumPy:
> 
>    https://github.com/numpy/numpy/pull/19478

Many years ago I wrote a package
https://github.com/geggo/uvml
that makes the VML, a fast implementation of transcendetal math functions, 
available for numpy. Don’t know if it still compiles.
It uses Intel VML, designed for processing arrays, not the SVML intrinsics. By 
this it is less machine dependent (optimized implementations are selected 
automatically depending on the availability of, e.g., SSE, AVX, or AVX512), 
just link to a library. It compiles as an external module, can be activated at 
runtime. 

Different precision models can be selected at runtime (globally). I thinks 
Intel advocates to use the LA (low accuracy) mode as a good compromise between 
performance and accuracy. Different people have strongly diverging opinions 
about what to expect.

The speedups possibly gained by these approaches often vaporize in 
non-benchmark applications, as for those functions performance is often limited 
by memory bandwidth, unless all your data stays in CPU cache. By default I 
would go for high accuracy mode, with option to switch to low accuracy if one 
urgently needs the better performance. But then one should use different 
approaches for speeding up numpy.

Gregor


> 
> part of numerical algorithms is that there is always a speed vs.
> precision trade-off, giving a more precise result is slower.
> 
> So there is a question what the general precision expectation should be
> in NumPy.  And how much is it acceptable to diverge in the
> precision/speed trade-off depending on CPU/system?
> 
> I doubt we can formulate very clear rules here, but any input on what
> precision you would expect or trade-offs seem acceptable would be
> appreciated!
> 
> 
> Some more details
> -----------------
> 
> This is mainly interesting e.g. for functions like logarithms,
> trigonometric functions, or cubic roots.
> 
> Some basic functions (multiplication, addition) are correct as per IEEE
> standard and give the best possible result, but these are typically
> only correct within very small numerical errors.
> 
> This is typically measured as "ULP":
> 
>     https://en.wikipedia.org/wiki/Unit_in_the_last_place
> 
> where 0.5 ULP would be the best possible result.
> 
> 
> Merging the PR may mean relaxing the current precision slightly in some
> places.  In general Intel advertises 4 ULP of precision (although the
> actual precision for most functions seems better).
> 
> 
> Here are two tables, one from glibc and one for the Intel functions:
> 
> https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.html
> (Mainly the LA column) 
> https://software.intel.com/content/www/us/en/develop/documentation/onemkl-vmperfdata/top/real-functions/measured-accuracy-of-all-real-vm-functions.html
> 
> 
> Different implementation give different accuracy, but formulating some
> guidelines/expectation (or referencing them) would be useful guidance. 
> 
> For basic 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to