>>
 >>
 >> I would like that as well. The problem is that even within a major
 >> architecture family, the implementations do not agree about their
 >> implementation of SIMD. AMD and Intel, for example, have differing
 >> functionality.
 >>
 >> One consequence of this is that you can only use SIMD automatically
 >from the
 >> compiler when (a) you're willing to compile to a particular target, or
 >(b)
 >> you're willing to compile multiple versions into a single binary.
 >
 >I don't think those are the biggest issues. Doing the really trivial
 >stuff like auto-vectorising a[.]=b[.]+c[.]*d[.], particularly in
 >floating point, where you don't have to worry about ranges, is easy.
 >But the single biggest issue, particularly for integer stuff -- which
 >is what a lot of multimedia is, is that assuming your compiler uses
 >the standard definition of optimizaiton that it must be impossible to
 >distringuish the original and the optimized program based on their
 >output, you have a pretty impossible program writing task for the
 >programmer and pattern recognition task for the compiler. Consider the
 >_mm_maddubs_epi16 that featured in the example routine that I posted:
 >it takes a vector of uint8_t's and vector of int8_t's and multiplies
 >corresponding elements, then adds adajcent pairs outputting a vector
 >of int16_t's. Try and write code for that in scalar code that you
 >think a compiler will be able to recognise and convert to the
 >intrinsic.
 >

Yes this is the biggest issue for SIMD , that said even if  you remove
70-90% of intrinsic lines it's a big win , will keep a solution simpler and
many programs will run without them and possibly on other platforms. 

Ben 

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to