Hi, On Fri, Jan 18, 2013 at 4:24 PM, Loren Merritt <lor...@u.washington.edu> wrote: > On Fri, 18 Jan 2013, Vitor Sessak wrote: >> On Wed, Jan 16, 2013 at 1:58 AM, Ronald S. Bultje <rsbul...@gmail.com> wrote: >> >>> +INIT_XMM sse >>> +cglobal vorbis_inverse_coupling, 3, 3, 6, mag, ang, block_size >>> + movsxdifnidn block_sizeq, block_sized >>> + mova m5, [pdw_80000000] >>> + lea magq, [magq+block_sizeq*4] >>> + lea angq, [angq+block_sizeq*4] >>> + neg block_sizeq >>> +.loop: >>> + mova m0, [magq+block_sizeq*4] >>> + mova m1, [angq+block_sizeq*4] >>> + xorps m2, m2 >>> + xorps m3, m3 >>> + cmpleps m2, m0 ; m <= 0.0 >>> + cmpleps m3, m1 ; a <= 0.0 >>> + andps m2, m5 ; keep only the sign bit >> >> Am I missing something or we can just do: >> >> andps m2, m0, m5 >> >> Instead of the xorps + cmpleps + andps? > > .loop: > mova m0, [magq+block_sizeq*4] > mova m1, [angq+block_sizeq*4] > xorps m4, m4 > andps m2, m5, m0 ; sign(m) > cmpnleps m4, m1 ; sign(a) > xorps m1, m2 > andps m3, m4, m1 > andnps m4, m1 > addps m3, m0 ; m = m + ((a < 0) & (a ^ sign(m))) > subps m0, m4 ; a = m - ((a > 0) & (a ^ sign(m))) > mova [magq+block_sizeq*4], m3 > mova [angq+block_sizeq*4], m0 > add block_sizeq, 4 > jl .loop > > (Any change to the comments is intentional; the previous comment was > wrong.)
This isn't faster for me, in fact it looks to be slightly slower (on a core i7). Ronald _______________________________________________ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel