Hello all, I am investigating how to speed up specific C functions within astropy, ideally without increasing maintenance burden significantly (see discussion https://github.com/astropy/astropy/issues/16902 for annotating functions with the `target_clones` function attribute).
The numpy approach for SIMD instructions and support for runtime dispatch seems appealing - what would be the recommended path to implement vectorised functions using the numpy universal intrinsics within astropy? I am hopeful that there might be a not-too-difficult-path because of the sub-section about "Reuse by other projects" in NEP 38 about SIMD optimizations. I see the WIP to add a C++ wrapper for the universal intrinsics (https://github.com/numpy/numpy/pull/21057) and the example application code added there to square an array (https://github.com/numpy/numpy/pull/21057/files#diff-cee58cafc4ff85b8fd3d174e5fc719dcffeee5bd72d820701087b920d6f302ecR38-R87) seems reasonably readable. Is that PR meant to be the starting point for external packages to attempt to write code with universal intrinsics? Thanks in advance! Manodeep _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com