—snip— > 1) Once NumPy adds the framework and initial set of Universal Intrinsic, if > contributors want to leverage a new architecture specific SIMD instruction, > will they be expected to add software implementation of this instruction for > all other architectures too?
In my opinion, if the instructions are lower, then yes. For example, one cannot add AVX-512 without adding, for example adding AVX-256 and AVX-128 and SSE*. However, I would not expect one person or team to be an expert in all assemblies, so intrinsics for one architecture can be developed independently of another. > 2) On whom does the burden lie to ensure that new implementations are > benchmarked and shows benefits on every architecture? What happens if > optimizing an Ufunc leads to improving performance on one architecture and > worsens performance on another? I would look at this from a maintainability point of view. If we are increasing the code size by 20% for a certain ufunc, there must be a domonstrable 20% increase in performance on any CPU. That is to say, micro-optimisation will be unwelcome, and code readability will be preferable. Usually we ask the submitter of the PR to test the PR with a machine they have on hand, and I would be inclined to keep this trend of self-reporting. Of course, if someone else came along and reported a performance regression of, say, 10%, then we have increased code by 20%, with only a net 5% gain in performance, and the PR will have to be reverted. —snip—
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion