Hi,

On Fri, Jan 18, 2013 at 4:24 PM, Loren Merritt <lor...@u.washington.edu> wrote:
> On Fri, 18 Jan 2013, Vitor Sessak wrote:
>> On Wed, Jan 16, 2013 at 1:58 AM, Ronald S. Bultje <rsbul...@gmail.com> wrote:
>>
>>> +INIT_XMM sse
>>> +cglobal vorbis_inverse_coupling, 3, 3, 6, mag, ang, block_size
>>> +    movsxdifnidn    block_sizeq, block_sized
>>> +    mova                     m5, [pdw_80000000]
>>> +    lea                    magq, [magq+block_sizeq*4]
>>> +    lea                    angq, [angq+block_sizeq*4]
>>> +    neg             block_sizeq
>>> +.loop:
>>> +    mova                     m0, [magq+block_sizeq*4]
>>> +    mova                     m1, [angq+block_sizeq*4]
>>> +    xorps                    m2, m2
>>> +    xorps                    m3, m3
>>> +    cmpleps                  m2, m0     ; m <= 0.0
>>> +    cmpleps                  m3, m1     ; a <= 0.0
>>> +    andps                    m2, m5     ; keep only the sign bit
>>
>> Am I missing something or we can just do:
>>
>> andps m2, m0, m5
>>
>> Instead of the xorps + cmpleps + andps?
>
> .loop:
>     mova     m0, [magq+block_sizeq*4]
>     mova     m1, [angq+block_sizeq*4]
>     xorps    m4, m4
>     andps    m2, m5, m0 ; sign(m)
>     cmpnleps m4, m1     ; sign(a)
>     xorps    m1, m2
>     andps    m3, m4, m1
>     andnps   m4, m1
>     addps    m3, m0     ; m = m + ((a < 0) & (a ^ sign(m)))
>     subps    m0, m4     ; a = m - ((a > 0) & (a ^ sign(m)))
>     mova   [magq+block_sizeq*4], m3
>     mova   [angq+block_sizeq*4], m0
>     add    block_sizeq, 4
>     jl .loop
>
> (Any change to the comments is intentional; the previous comment was
> wrong.)

This isn't faster for me, in fact it looks to be slightly slower (on a core i7).

Ronald
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to