> > > You've mentioned that ISPC generated code is 5-10% faster that > hand-written intrinsics. Were you talking about ARM only or x86 as well? >
The claim was based on comparing VS2015 Update 3 (with the new SSA based optimiser turned on) compiling hand written x64 AVX intrinsics to x64 AVX generated by ISPC based on clang/LLVM 3.8.1 running on an Ivy Bridge CPU. By the way, we have addressing=64 for ISPC for that, it generated 3-5% faster code in our use cases. I haven't tested other compilers nor other CPUs, it would be likely GCC 6's optimiser would do a better job than MSVC's in my experience. And it wasn't always a win, sometimes the intrinsics beat ISPC code by 5% of so, but they were less than 7% of the total benchmarks compared, in the other 93% ISPC equalled or beat MSVC. In the rest ISPC generated code won by 5%-10%. My employer is very seriously considering switching the entire codebase over to the ISPC output on all platforms, pending equal or better success on much wider benchmarking on a good selection of the types of hardware the customers use. For us worst case performance is much more important than average performance, and whilst worst case improved on my Ivy Bridge CPU, it might not on other CPUs. > > Also, I'm curious, what typical speed up are you observing on your code > using ARM Neon and SSE/AVX versus scalar implementation? > For the routines written to use SIMD, it's nearly linear to SIMD units, so on my Ivy Bridge SSE2 gains 3.6x, AVX1 7.0x, NEON 3.4x. The code was specifically designed to make the best of SIMD for those code paths so optimised, so this is really an embarrassingly parallel problem. AVX512 ought to approximate 13-15x for example. For what the customer sees in terms of the product performance as the SIMD parts are buried deep inside the product, on ARM NEON performance on average improves by about one third on average over scalar, worst case improves by two thirds. The worst case is far more important to our customers than the average case. I cannot give you easily the improvement on Intel over scalar for the whole product as our code lost the ability to work without SSE2 a few months ago after we refactored around FTZ/DAZ always being on. That means I can no longer build - without rehacking cmake - a scalar edition easily. It was more than 50% for SSE2 however, and I can tell you we gain another 13% on that again with AVX1 on my Ivy Bridge. > > And thanks for mentioning CppCon submission, I didn't know about that. > > You may or may not be aware that ISO WG21 is in the process of standardising C++'s support for SIMD. There were three camps of opinion last time I looked, one just wants intrinsics alone, one wants proper SIMD understanding throughout the C++ language and the STL, the other I can't recall right now. From the attendees listed, all three camps will be present in the audience at that student's CppCon talk, and I am sure all three will have an opinion with regard to WG21 direction given ISPC's prior art (I am not sure if the student realises he has brought such eminence upon himself yet, but it should be a fun talk just for the audience debate alone). I will certainly publicly declare myself in favour of proper SIMD understanding throughout the C++ language and the STL, I was approaching that position myself before using ISPC in earnest. Now I am sure it is the correct move and that an intrinsics only approach is wrong. You may or may not also be aware that the C++ standard template library is to be rebooted from scratch very soon now, it's informally called "STL2". Microsoft Visual Studio 2017 will ship with support for it. I know that SIMD-awareness was considered important for STL2, so in theory it could come to be that std::vector<float[16]> will "just work" in C++ 2020 for all major compilers. Niall -- You received this message because you are subscribed to the Google Groups "Intel SPMD Program Compiler Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
