Re: Surprising code being generated by ARM NEON backend

Niall Douglas Wed, 07 Sep 2016 06:19:53 -0700

>
>
> You've mentioned that ISPC generated code is 5-10% faster that 
> hand-written intrinsics. Were you talking about ARM only or x86 as well?
>


The claim was based on comparing VS2015 Update 3 (with the new SSA based 
optimiser turned on) compiling hand written x64 AVX intrinsics to x64 AVX 
generated by ISPC based on clang/LLVM 3.8.1 running on an Ivy Bridge CPU. 
By the way, we have addressing=64 for ISPC for that, it generated 3-5% 
faster code in our use cases.

I haven't tested other compilers nor other CPUs, it would be likely GCC 6's 
optimiser would do a better job than MSVC's in my experience. And it wasn't 
always a win, sometimes the intrinsics beat ISPC code by 5% of so, but they 
were less than 7% of the total benchmarks compared, in the other 93% ISPC 
equalled or beat MSVC. In the rest ISPC generated code won by 5%-10%. My 
employer is very seriously considering switching the entire codebase over 
to the ISPC output on all platforms, pending equal or better success on 
much wider benchmarking on a good selection of the types of hardware the 
customers use. For us worst case performance is much more important than 
average performance, and whilst worst case improved on my Ivy Bridge CPU, 
it might not on other CPUs.
 

>
> Also, I'm curious, what typical speed up are you observing on your code 
> using ARM Neon and SSE/AVX versus scalar implementation?
>

For the routines written to use SIMD, it's nearly linear to SIMD units, so 
on my Ivy Bridge SSE2 gains 3.6x, AVX1 7.0x, NEON 3.4x. The code was 
specifically designed to make the best of SIMD for those code paths so 
optimised, so this is really an embarrassingly parallel problem. AVX512 
ought to approximate 13-15x for example.

For what the customer sees in terms of the product performance as the SIMD 
parts are buried deep inside the product, on ARM NEON performance on 
average improves by about one third on average over scalar, worst case 
improves by two thirds. The worst case is far more important to our 
customers than the average case.

I cannot give you easily the improvement on Intel over scalar for the whole 
product as our code lost the ability to work without SSE2 a few months ago 
after we refactored around FTZ/DAZ always being on. That means I can no 
longer build - without rehacking cmake - a scalar edition easily. It was 
more than 50% for SSE2 however, and I can tell you we gain another 13% on 
that again with AVX1 on my Ivy Bridge.
 

>
> And thanks for mentioning CppCon submission, I didn't know about that.
>
> You may or may not be aware that ISO WG21 is in the process of 
standardising C++'s support for SIMD. There were three camps of opinion 
last time I looked, one just wants intrinsics alone, one wants proper SIMD 
understanding throughout the C++ language and the STL, the other I can't 
recall right now. From the attendees listed, all three camps will be 
present in the audience at that student's CppCon talk, and I am sure all 
three will have an opinion with regard to WG21 direction given ISPC's prior 
art (I am not sure if the student realises he has brought such eminence 
upon himself yet, but it should be a fun talk just for the audience debate 
alone). I will certainly publicly declare myself in favour of proper SIMD 
understanding throughout the C++ language and the STL, I was approaching 
that position myself before using ISPC in earnest. Now I am sure it is the 
correct move and that an intrinsics only approach is wrong.

You may or may not also be aware that the C++ standard template library is 
to be rebooted from scratch very soon now, it's informally called "STL2". 
Microsoft Visual Studio 2017 will ship with support for it. I know that 
SIMD-awareness was considered important for STL2, so in theory it could 
come to be that std::vector<float[16]> will "just work" in C++ 2020 for all 
major compilers.

Niall

-- 
You received this message because you are subscribed to the Google Groups 
"Intel SPMD Program Compiler Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Surprising code being generated by ARM NEON backend

Reply via email to