Hi,

On Tue, Jul 24, 2012 at 9:41 PM, Ronald S. Bultje <rsbul...@gmail.com> wrote:
> Hi,
>
> On Sat, Jul 14, 2012 at 9:29 PM, Justin Ruggles
> <justin.rugg...@gmail.com> wrote:
>> ---
>>  libavresample/x86/audio_convert.asm    |   38 
>> ++++++++++++++++++++++++++++++++
>>  libavresample/x86/audio_convert_init.c |   11 +++++++++
>>  2 files changed, 49 insertions(+), 0 deletions(-)
>>
>> diff --git a/libavresample/x86/audio_convert.asm 
>> b/libavresample/x86/audio_convert.asm
>> index 9ba7251..70519e1 100644
>> --- a/libavresample/x86/audio_convert.asm
>> +++ b/libavresample/x86/audio_convert.asm
>> @@ -734,3 +734,41 @@ CONV_FLTP_TO_FLT_6CH
>>  INIT_XMM avx
>>  CONV_FLTP_TO_FLT_6CH
>>  %endif
>> +
>> +;------------------------------------------------------------------------------
>> +; void ff_conv_s16_to_s16p_2ch(int16_t *const *dst, int16_t *src, int len,
>> +;                              int channels);
>> +;------------------------------------------------------------------------------
>> +
>> +%macro CONV_S16_TO_S16P_2CH 0
>> +cglobal conv_s16_to_s16p_2ch, 3,4,3, dst0, src, len, dst1
>> +    lea       lenq, [2*lend]
>> +    mov      dst1q, [dst0q+gprsize]
>> +    mov      dst0q, [dst0q        ]
>> +    lea       srcq, [srcq+2*lenq]
>> +    add      dst0q, lenq
>> +    add      dst1q, lenq
>> +    neg       lenq
>> +    ALIGN 16
>> +.loop:
>> +    mova        m0, [srcq+2*lenq       ]
>> +    mova        m1, [srcq+2*lenq+mmsize]
>> +    pshuflw     m0, m0, q3120
>> +    pshufhw     m0, m0, q3120
>> +    pshuflw     m1, m1, q3120
>> +    pshufhw     m1, m1, q3120
>> +    shufps      m2, m0, m1, q2020
>> +    shufps      m0, m1, q3131
>> +    mova  [dst0q+lenq], m2
>> +    mova  [dst1q+lenq], m0
>
> The more common way to do this (I believe) is to set up mask reg:
>
> pcmpeqb m4, m4
> psrlw m4, 8 ; 0x00ff
>
> Then mask/shift:
>
> mova m0, [srcq+2*lenq+0*mmsize]
> mova m1, [srcq+2*lenq+1*mmsize]
> psrlw m2, m0, 8
> psrlw m3, m1, 8
> pand m0, m4
> pand m1, m4
> packsswb m0, m1
> packsswb m2, m3
> mova [dst1q+lenq], m0
> mova [dst2q+lenq], m2
>
> However, that's not less instructions, maybe worth checking anyway.
>
> Alternatively, a pshufb version:
>
> mova m3, [pb_02468ace13579bdf]
> .loop:
> mova m0, [srcq+2*lenq+0*mmsize]
> mova m1, [srcq+2*lenq+1*mmsize]
> pshufb m0, m3
> pshufb m1, m3
> punpcklqdq m2, m0, m1
> punpckhqdq m0, m1
> mova [dst1q+lenq], m2
> mova [dst2q+lenq], m0
>
> 2 instructions less, and only 2 unpacks as opposed to all the
> shuffles, so potentially faster (except on Atom where pshufb is
> dog-slow).

Actually that's all byte-based, but I guess it's obvious what I mean
so should be easy to convert to word-speak.

Ronald
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to