[Numpy-discussion] Re: [RFC] - numpy/SVML appears to be poorly optimized

Matti Picus Mon, 15 Nov 2021 01:52:07 -0800

On 6/11/21 6:56 pm, Sayed Adel wrote:

> appears to be poorly optimized.
It should perform well, not poor neither heavily optimized.
> this also makes it quite difficult to improve (with either a bettercompiler or by hand).
We can put the blame on Intel for not sharing their source code buthonestly, it seems we had no other option except accept what they provide.
> Some of the glaring issues are:
> 1. register allocation / spilling
> 2. rodata layouts / const-propagation of the values.
> 3. Very odd use of internal functions that really ought to be inlined.

let me add to your list another two points:
- It only works on Linux.
- It only works with AVX512.
> If so, are people open to patches that optimize them (either withnew C implementations are in the current assembly
implementations).
Hopefully, we will able to convert them to universal intrinsics(nep-38) one day. As one of the team, I will try to push more time for it.
Thanks, Sayed.

Note the benchmarks on Sayed's PR [0] to move tanh to universalintrinsics. It not only supplies the routines for alluniversal-intrinsics-supported platforms, it even slightly increasedperformance on AVX512 (usual disclaimers about dangers of comparingbenchmarks apply).



Matti


[0] https://github.com/numpy/numpy/pull/20363

_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: [RFC] - numpy/SVML appears to be poorly optimized

Reply via email to