Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics

Devulapalli, Raghuveer Tue, 11 Feb 2020 10:04:27 -0800

>> I think this doesn't quite answer the question. If I understand correctly, 
>> it's about a single instruction (e.g. one needs "VEXP2PD" and it's missing 
>> from the  supported AVX512 instructions in master). I think the answer is 
>> yes, it needs to be added for other architectures as well.

That adds a lot of overhead to write SIMD based optimizations which can 
discourage contributors. It’s also an unreasonable expectation that a developer 
be familiar with SIMD of all the architectures. On top of that the performance 
implications aren’t clear. Software implementations of hardware instructions 
might perform worse and might not even produce the same result.

From: NumPy-Discussion 
<numpy-discussion-bounces+raghuveer.devulapalli=intel....@python.org> On Behalf 
Of Ralf Gommers
Sent: Monday, February 10, 2020 9:17 PM
To: Discussion of Numerical Python <[email protected]>
Subject: Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics

On Tue, Feb 4, 2020 at 2:00 PM Hameer Abbasi 
<[email protected]<mailto:[email protected]>> wrote:
—snip—

> 1) Once NumPy adds the framework and initial set of Universal Intrinsic, if 
> contributors want to leverage a new architecture specific SIMD instruction, 
> will they be expected to add software implementation of this instruction for 
> all other architectures too?

In my opinion, if the instructions are lower, then yes. For example, one cannot 
add AVX-512 without adding, for example adding AVX-256 and AVX-128 and SSE*.  
However, I would not expect one person or team to be an expert in all 
assemblies, so intrinsics for one architecture can be developed independently 
of another.

I think this doesn't quite answer the question. If I understand correctly, it's 
about a single instruction (e.g. one needs "VEXP2PD" and it's missing from the 
supported AVX512 instructions in master). I think the answer is yes, it needs 
to be added for other architectures as well. Otherwise, if universal intrinsics 
are added ad-hoc and there's no guarantee that a universal instruction is 
available for all main supported platforms, then over time there won't be much 
that's "universal" about the framework.

This is a different question though from adding a new ufunc implementation. I 
would expect accelerating ufuncs via intrinsics that are already supported to 
be much more common than having to add new intrinsics. Does that sound right?

> 2) On whom does the burden lie to ensure that new implementations are 
> benchmarked and shows benefits on every architecture? What happens if 
> optimizing an Ufunc leads to improving performance on one architecture and 
> worsens performance on another?

This is slightly hard to provide a recipe for. I suspect it may take a while 
before this becomes an issue, since we don't have much SIMD code to begin with. 
So adding new code with benchmarks will likely show improvements on all 
architectures (we should ensure benchmarks can be run via CI, otherwise it's 
too onerous). And if not and it's not easily fixable, the problematic platform 
could be skipped so performance there is unchanged.

Only once there's existing universal intrinsics and then they're tweaked will 
we have to be much more careful I'd think.

Cheers,
Ralf

I would look at this from a maintainability point of view. If we are increasing 
the code size by 20% for a certain ufunc, there must be a domonstrable 20% 
increase in performance on any CPU. That is to say, micro-optimisation will be 
unwelcome, and code readability will be preferable. Usually we ask the 
submitter of the PR to test the PR with a machine they have on hand, and I 
would be inclined to keep this trend of self-reporting. Of course, if someone 
else came along and reported a performance regression of, say, 10%, then we 
have increased code by 20%, with only a net 5% gain in performance, and the PR 
will have to be reverted.

—snip—
_______________________________________________
NumPy-Discussion mailing list
[email protected]<mailto:[email protected]>
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics

Reply via email to