Hi, On Tue, May 10, 2011 at 8:08 PM, Ronald S. Bultje <[email protected]> wrote: > 2011/5/10 Måns Rullgård <[email protected]>: >> "Ronald S. Bultje" <[email protected]> writes: >>> @@ -703,9 +718,11 @@ static void FUNCC(pred8x8l_vertical)(uint8_t *_src, >>> int has_topleft, int has_top >>> src[5] = t5; >>> src[6] = t6; >>> src[7] = t7; >>> + a = AV_RN4PA(((pixel4*)src)+0); >>> + b = AV_RN4PA(((pixel4*)src)+1); >>> for( y = 1; y < 8; y++ ) { >>> - ((pixel4*)(src+y*stride))[0] = ((pixel4*)src)[0]; >>> - ((pixel4*)(src+y*stride))[1] = ((pixel4*)src)[1]; >>> + AV_WN4PA(((pixel4*)(src+y*stride))+0, a); >>> + AV_WN4PA(((pixel4*)(src+y*stride))+1, b); >> >> AV_COPY* might be used here. > > I was wondering if that created optimal asm output. I'll test later > tonight. I was worried that it'd re-read the source pixels for each > loop iteration.
Just confirmed that my compiler indeed re-reads the values for each iteration... Possibly because it cannot guarantee that stride is large enough to ensure that the two never overlap? I'd like to keep it as in my original patch. Ronald _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
