On Wed, Feb 27, 2013 at 1:31 PM, "René J.V. Bertin" <[email protected]> wrote:
> For me this settles the question: better stick to not using 
> auto-vectorisation esp. since it causes a few tests to fail.
>
> I have yet to test my modifications on MS Windows but I'd be willing to post 
> a patch for this option (but also to admit it'd annoy me to have to adapt my 
> cross-platform HR timing routines to ffmpeg naming conventions :( )
>
...
> Detailed benchmark results: (32 bit, MMX/SSE code, -fno-tree-vectorize)
>                    samples          user t        kernel t          real t    
>        CPU %
> Video decode  :      85166         27.0846s        2.48361s        13.5333s   
>      218.484%


Wait... 200%... what's your hardware like?

If by any chance you have Hyper Threading enabled (which is quite
likely), then I bet that's what the penalty is coming from (there's
only one SIMD execution unit, and thus no real parallelization of SIMD
code, whereas float code can be run in parallel with hand-optimized
SIMD code or other integer code).
_______________________________________________
Libav-user mailing list
[email protected]
http://ffmpeg.org/mailman/listinfo/libav-user

Reply via email to