On 06/06/2012 11:29 AM, Ronald S. Bultje wrote:
> Hi,
> 
> On Wed, Jun 6, 2012 at 8:22 AM, Justin Ruggles <[email protected]> 
> wrote:
>> ---
>> Separated loads from muls. It's slightly faster for SSE2. If making this an
>> AVX function is ever faster on some system, we can change it to 3-arg mulps.
>>
>>  libavresample/x86/audio_convert.asm    |   33 
>> ++++++++++++++++++++++++++++++++
>>  libavresample/x86/audio_convert_init.c |    4 +++
>>  2 files changed, 37 insertions(+), 0 deletions(-)
> 
> Looks OK to me.
> 
> Confusing why AVX is slower...

I must have done something wrong in my earlier tests. I retested just
now and got different results.

C    - 4070
SSE2 -  463
AVX  -  428

Sending new patch soon.

I also retested the flt to s16 function, but it was still not faster for
AVX because it requires extracting the high 128 bits to XMM in order to
do the packssdw.

-Justin
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to