On 06/06/2012 11:29 AM, Ronald S. Bultje wrote: > Hi, > > On Wed, Jun 6, 2012 at 8:22 AM, Justin Ruggles <[email protected]> > wrote: >> --- >> Separated loads from muls. It's slightly faster for SSE2. If making this an >> AVX function is ever faster on some system, we can change it to 3-arg mulps. >> >> libavresample/x86/audio_convert.asm | 33 >> ++++++++++++++++++++++++++++++++ >> libavresample/x86/audio_convert_init.c | 4 +++ >> 2 files changed, 37 insertions(+), 0 deletions(-) > > Looks OK to me. > > Confusing why AVX is slower...
I must have done something wrong in my earlier tests. I retested just now and got different results. C - 4070 SSE2 - 463 AVX - 428 Sending new patch soon. I also retested the flt to s16 function, but it was still not faster for AVX because it requires extracting the high 128 bits to XMM in order to do the packssdw. -Justin _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
