>> >> >> I would like that as well. The problem is that even within a major >> architecture family, the implementations do not agree about their >> implementation of SIMD. AMD and Intel, for example, have differing >> functionality. >> >> One consequence of this is that you can only use SIMD automatically >from the >> compiler when (a) you're willing to compile to a particular target, or >(b) >> you're willing to compile multiple versions into a single binary. > >I don't think those are the biggest issues. Doing the really trivial >stuff like auto-vectorising a[.]=b[.]+c[.]*d[.], particularly in >floating point, where you don't have to worry about ranges, is easy. >But the single biggest issue, particularly for integer stuff -- which >is what a lot of multimedia is, is that assuming your compiler uses >the standard definition of optimizaiton that it must be impossible to >distringuish the original and the optimized program based on their >output, you have a pretty impossible program writing task for the >programmer and pattern recognition task for the compiler. Consider the >_mm_maddubs_epi16 that featured in the example routine that I posted: >it takes a vector of uint8_t's and vector of int8_t's and multiplies >corresponding elements, then adds adajcent pairs outputting a vector >of int16_t's. Try and write code for that in scalar code that you >think a compiler will be able to recognise and convert to the >intrinsic. >
Yes this is the biggest issue for SIMD , that said even if you remove 70-90% of intrinsic lines it's a big win , will keep a solution simpler and many programs will run without them and possibly on other platforms. Ben _______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
