Hi,

On Tue, May 10, 2011 at 8:08 PM, Ronald S. Bultje <[email protected]> wrote:
> 2011/5/10 Måns Rullgård <[email protected]>:
>> "Ronald S. Bultje" <[email protected]> writes:
>>> @@ -703,9 +718,11 @@ static void FUNCC(pred8x8l_vertical)(uint8_t *_src, 
>>> int has_topleft, int has_top
>>>      src[5] = t5;
>>>      src[6] = t6;
>>>      src[7] = t7;
>>> +    a = AV_RN4PA(((pixel4*)src)+0);
>>> +    b = AV_RN4PA(((pixel4*)src)+1);
>>>      for( y = 1; y < 8; y++ ) {
>>> -        ((pixel4*)(src+y*stride))[0] = ((pixel4*)src)[0];
>>> -        ((pixel4*)(src+y*stride))[1] = ((pixel4*)src)[1];
>>> +        AV_WN4PA(((pixel4*)(src+y*stride))+0, a);
>>> +        AV_WN4PA(((pixel4*)(src+y*stride))+1, b);
>>
>> AV_COPY* might be used here.
>
> I was wondering if that created optimal asm output. I'll test later
> tonight. I was worried that it'd re-read the source pixels for each
> loop iteration.

Just confirmed that my compiler indeed re-reads the values for each
iteration... Possibly because it cannot guarantee that stride is large
enough to ensure that the two never overlap? I'd like to keep it as in
my original patch.

Ronald
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to