Re: Surprising code being generated by ARM NEON backend

Dmitry Babokin Mon, 12 Sep 2016 02:48:07 -0700

Interesting, it's effectively exploiting functional programming to extract
parallelism. I was always wondering why there are no implementations of
functional languages, which are specifically targeted at extracting SIMD
level parallelism. Or probably I'm just not aware of such languages. Though
the limitation, which you've mentioned - no ways to express communication
between instances - is severe drawback of the approach. But the potential
upside is unified approach for concurrency and parallelism, as coroutines
are primarily targeted at concurrency.


For the definition of "same" operations, I think it's not a hard problem
and it should not be restricted by the language. It should be up to
compiler to decide what level of control flow divergence the hardware can
handle.

On Fri, Sep 9, 2016 at 1:11 PM, Niall Douglas <nialldougla...@gmail.com>
wrote:

> On Thursday, September 8, 2016 at 7:30:40 PM UTC+1, Dmitry Babokin wrote:
>>
>> You should have really parallelization friendly code to get close to
>> theoretical scaling on all vector units.
>>
>> For parallelization approaches, intrinsics are obviously not good enough,
>> as they are not suggesting performance portability and I think there's
>> quite broad consensus about it in the industry. But all alternative are far
>> not ideal. For quite some time auto-vectorization was a way to go. But it's
>> not reliable and we obviously need a language solution. I personally a bit
>> sceptical that C++ standard committee can converge on something by C++21
>> deadline :) So ISPC and other explicit vectorization solutions have some
>> time till C++ suggest viable alternative. Though I hope it will happen
>> earlier that later.
>>
>> I've remembered that third approach I couldn't remember before.
>
> As you may be aware, C++ 17-21 is gaining Coroutines which is an embedded
> domain specific sublanguage allowing a large subset of C++ in coroutines.
> The proposal is that SIMD optimal code would be generated by the compiler
> when you apply a coroutine which performs the same [1] operation to each
> member of some ContiguousIterable e.g. std::vector<float> with alignas(64).
> The compiler would spot the fact that the same [1] operations are being
> applied to an array of a SIMDable type and "do the right thing" as it were.
>
> [1] The hard part is defining "same". Some branching would be allowed,
> obviously. But the proposal if I remember was the same restrictions as
> constexpr programming which is much more restrictive than ISPC e.g. no
> communication possible between instances. That might have loosened since,
> it's hard to keep up to date with standardisation.
>
> The big advantage of this approach is that because the coroutine EDSL is
> not fixed yet and backwards compatibility isn't a problem, nasty surprises
> with legacy codebases ought to not occur. The big disadvantage is that it
> makes the already contentious Coroutines TS even more contentious :)
>
> Anyway, Microsoft are the ones leading the charge on the Coroutines TS
> implementation before standardisation, so I guess watch VS2017 closely and
> see how much of C++ AMP they merge into their Coroutines implementation.
>
> Niall
>
> --
> You received this message because you are subscribed to the Google Groups
> "Intel SPMD Program Compiler Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ispc-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Intel SPMD Program Compiler Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to ispc-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Surprising code being generated by ARM NEON backend

Reply via email to