Hi, On Sat, Oct 15, 2011 at 7:01 AM, Ronald S. Bultje <[email protected]> wrote: > On Sat, Oct 15, 2011 at 2:53 AM, Loren Merritt <[email protected]> > wrote: >> On Fri, 14 Oct 2011, Ronald S. Bultje wrote: >> >>> + packusdw m0, m1 >>> + packusdw m2, m3 >> >> sse4 > > Ah, that's why Kieran's assembly was marked sse4. I'll make a > sse2-version that needs a pmaxsw x, zero also then. > >> Are things usually unaligned? > > No, I'm a little too pessimistic in this patch. In fact, the src in > this function is always aligned, so these should be mova. I'm not sure > about dest, in my tests they tend to be aligned but I'm not sure if > the API guarantees that. I don't think it does. I can test for > alignment at function start and split the loop into two copies, one > for aligned dest and one for unaligned dest.
New patch attached. If the aligned memory move is important to you, I'll write a test that ensures alignment for >=sse2 and use a mova-version of the same copy in that case. Ronald
0001-swscale-write-yuv2plane1-MMX-SSE2-SSE4-functions.patch
Description: Binary data
_______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
