Thanks, Claudio! On Feb 15, 2013, at 16:33, Claudio Freire wrote:
> gcc 4.7 is clever enough to generate SSE code by itself. Maybe that's > what you're experiencing. I guess compiler flags do matter too. I haven't compiled with -ftree-vectorize (rather, I tried with and without, made no difference), but you're right ... -fno-tree-vectorize gets me back to the 2x faster performance of the hand-coded SSE version. Amazing, I never really saw a lot of benefit to the tree-vectoriser before! If it wasn't clear, I didn't hand code the SSE version myself, so comparing the versions will be like looking for the relative differences between the works of 2 post-modern art schools ;) I've run the code through Shark, though, and that showed a clear load difference in disfavour of the SSE version. > gcc, which tends to inhibit many of its other optimizations. Why don't > you try gcc's vector primitives instead? Which ones? As in the few lines with intrinsics for MSVC, which also compile under gcc but shows no speed dis/advantage with gcc ? BTW, this does beg the question why ffmpeg's build process uses -fno-tree-vectorize ... maybe that's no longer required for today's compilers? R. _______________________________________________ Libav-user mailing list [email protected] http://ffmpeg.org/mailman/listinfo/libav-user
