Re: Surprising code being generated by ARM NEON backend

Dmitry Babokin Tue, 06 Sep 2016 12:20:09 -0700

Niall,

Thanks for sharing your story, it's really rewarding to hear that our tool
works so well for you!


You've mentioned that ISPC generated code is 5-10% faster that hand-written
intrinsics. Were you talking about ARM only or x86 as well?

Also, I'm curious, what typical speed up are you observing on your code
using ARM Neon and SSE/AVX versus scalar implementation?

And thanks for mentioning CppCon submission, I didn't know about that.

Dmitry.

On Mon, Sep 5, 2016 at 6:19 PM, Niall Douglas <[email protected]>
wrote:

> Pull request as requested is at https://github.com/ispc/ispc/pull/1227.
> My thanks to my employer for sponsoring the improvement.
>
> Thanks for the help and the product guys. For the same effort of hand
> porting all our SSE and AVX intrinsic code to ARM NEON we got an ISPC port
> instead which supports everything we need now and into the future. In case
> anyone is interested, we actually have a Python script call ISPC both on
> Windows and via the Linux Subsystem for Windows to generate assembler files
> for ARM NEON float x4, SSE2 float x4, AVX float x8 and AVX2 float x16 for
> the calling conventions x86, x64-msvc and x64-sysv (the need to call ISPC
> under Windows vs ISPC under Linux is to get both the x64-msvc and x64-sysv
> calling convention output). Those assembler files are then parsed by the
> Python script, translating the armhf output ISPC generates into armel for
> Android ARM and doing a few other hand tweaks, and are committed directly
> to source control as they change rarely. During build, the AT&T format
> assembler files are compiled as normal by cmake for Linux/BSD/OS X/Android
> but on Windows we abuse the Mingw-w64 GNU as assembler to make it generate
> a MSVC compatible .obj file from the AT&T assembler
> but-in-msvc-calling-convention files output by ISPC. That is then linked in
> by Visual Studio as per normal. Believe it or not, it all works a treat.
> It's been a very successful risk we took in choosing ISPC to generate
> assembler instead of doing it by hand, and I'm sad to say we are done with
> optimisation now and moving on to other topics far removed. Nevertheless
> thanks once again, and you may like to know the only reason we heard of
> your work is because I sit on the Programme Committee for CppCon where this
> (now accepted) talk https://cppcon2016.sched.org/
> event/7nKw/spmd-programming-using-c-and-ispc was one assigned to me for
> review. So many thanks to that student for bringing ISPC to our attention!
>
> Niall
>
>
> On Friday, September 2, 2016 at 3:31:25 PM UTC+1, Niall Douglas wrote:
>>
>>
>>>> It does seem very odd that LLVM wouldn't automatically inline a
>>> function consisting of a single instruction.
>>>
>>
>> I've discovered through trial and error it is the lack of the "readnone"
>> modifier which causes LLVM to not inline the function. After looking up
>> that modifier I can see why that would be the case, and indeed why the lack
>> of that modifier would penalise optimisation of ARM NEON generated because
>> LLVM will assume every such function not so marked will change outcomes if
>> global memory state could have been changed. In particular, it would
>> severely restrict the reordering of instructions LLVM could do.
>>
>> Quite a few of the ARM NEON builtins are missing "readnone". None that I
>> can see of the AVX builtins is missing it. I am surprised this problem
>> hasn't been raised before, it's very obvious from the assembler output.
>>
>>
>>>
>>> I've asked my employer for the time to send a pull request. If it's
>>> granted, happy to oblige.
>>>
>>> I've been allowed this time by my employer who wishes to remain
>> anonymous. I'll issue a pull request next week which applies nounwind
>> readnone alwaysinline to everything in the NEON builtins, using the AVX
>> builtins as a guide. I should think this will improve the optimisation
>> quality of the NEON output quite a bit wherever it uses the builtins.
>>
>> Niall
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Intel SPMD Program Compiler Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Intel SPMD Program Compiler Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Surprising code being generated by ARM NEON backend

Reply via email to