Don wrote:
Bill Baxter wrote:
Actually its 4^4 if you do it like OpenCL/GLSL/HLSL/Cg and allow
repeats like .xxyy.

Yes. Is the syntax sugar actually needed for all the permutations?
Even so, it's still only 256, which is probably still OK. I don't think a language change is required.

There's no need to ever enumerate all functions - they can be generated with templates and mixins rather easily.

This scheme doesn't cover:
* shufp  where the two sources are different
* haddpd, haddps [SSE3] { double[2] a, b; a[0]=a[0]+a[1]; a[1]=b[0]+b[1]; } * non-temporal stores (although I think these are covered adequately by array operations)

Well probably we can find ways to generate those too.

and the byte/word operations:

* pack with saturation
* movmsk
* avg
* multiply and add.

So it looks to me as though with the minimal language changes, we could get almost complete SIMD support, with excellent syntax.


That sounds great.


Andrei

Reply via email to