Don wrote:
Bill Baxter wrote:
Actually its 4^4 if you do it like OpenCL/GLSL/HLSL/Cg and allow
repeats like .xxyy.
Yes. Is the syntax sugar actually needed for all the permutations?
Even so, it's still only 256, which is probably still OK. I don't think
a language change is required.
There's no need to ever enumerate all functions - they can be generated
with templates and mixins rather easily.
This scheme doesn't cover:
* shufp where the two sources are different
* haddpd, haddps [SSE3] { double[2] a, b; a[0]=a[0]+a[1];
a[1]=b[0]+b[1]; }
* non-temporal stores (although I think these are covered adequately by
array operations)
Well probably we can find ways to generate those too.
and the byte/word operations:
* pack with saturation
* movmsk
* avg
* multiply and add.
So it looks to me as though with the minimal language changes, we could
get almost complete SIMD support, with excellent syntax.
That sounds great.
Andrei