Hello all,

I am investigating how to speed up specific C functions within astropy, ideally 
without increasing maintenance burden significantly (see discussion 
https://github.com/astropy/astropy/issues/16902 for annotating functions with 
the `target_clones` function attribute). 

The numpy approach for SIMD instructions and support for runtime dispatch seems 
appealing - what would be the recommended path to implement vectorised 
functions using the numpy universal intrinsics within astropy? I am hopeful 
that there might be a not-too-difficult-path because of the sub-section about 
"Reuse by other projects" in NEP 38 about SIMD optimizations.

I see the WIP to add a C++ wrapper for the universal intrinsics 
(https://github.com/numpy/numpy/pull/21057)  and the example application code 
added there to square an array 
(https://github.com/numpy/numpy/pull/21057/files#diff-cee58cafc4ff85b8fd3d174e5fc719dcffeee5bd72d820701087b920d6f302ecR38-R87)
 seems reasonably readable. Is that PR meant to be the starting point for 
external packages to attempt to write code with universal intrinsics?

Thanks in advance!
Manodeep
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to