Thanks, Claudio!

On Feb 15, 2013, at 16:33, Claudio Freire wrote:

> gcc 4.7 is clever enough to generate SSE code by itself. Maybe that's
> what you're experiencing. I guess compiler flags do matter too.

I haven't compiled with -ftree-vectorize (rather, I tried with and without, 
made no difference), but you're right ... -fno-tree-vectorize gets me back to 
the 2x faster performance of the hand-coded SSE version. Amazing, I never 
really saw a lot of benefit to the tree-vectoriser before!

If it wasn't clear, I didn't hand code the SSE version myself, so comparing the 
versions will be like looking for the relative differences between the works of 
2 post-modern art schools ;)
I've run the code through Shark, though, and that showed a clear load 
difference in disfavour of the SSE version.

> gcc, which tends to inhibit many of its other optimizations. Why don't
> you try gcc's vector primitives instead?

Which ones? As in the few lines with intrinsics for MSVC, which also compile 
under gcc but shows no speed dis/advantage with gcc ?

BTW, this does beg the question why ffmpeg's build process uses 
-fno-tree-vectorize ... maybe that's no longer required for today's compilers?

R.
_______________________________________________
Libav-user mailing list
[email protected]
http://ffmpeg.org/mailman/listinfo/libav-user

Reply via email to