On Wed, Feb 27, 2013 at 1:31 PM, "René J.V. Bertin" <[email protected]> wrote: > For me this settles the question: better stick to not using > auto-vectorisation esp. since it causes a few tests to fail. > > I have yet to test my modifications on MS Windows but I'd be willing to post > a patch for this option (but also to admit it'd annoy me to have to adapt my > cross-platform HR timing routines to ffmpeg naming conventions :( ) > ... > Detailed benchmark results: (32 bit, MMX/SSE code, -fno-tree-vectorize) > samples user t kernel t real t > CPU % > Video decode : 85166 27.0846s 2.48361s 13.5333s > 218.484%
Wait... 200%... what's your hardware like? If by any chance you have Hyper Threading enabled (which is quite likely), then I bet that's what the penalty is coming from (there's only one SIMD execution unit, and thus no real parallelization of SIMD code, whereas float code can be run in parallel with hand-optimized SIMD code or other integer code). _______________________________________________ Libav-user mailing list [email protected] http://ffmpeg.org/mailman/listinfo/libav-user
