Re: [libav-devel] [PATCH] fix fate failures for 10bit H264 on some systems

Måns Rullgård Wed, 11 May 2011 01:04:49 -0700

"Ronald S. Bultje" <[email protected]> writes:

> Hi,
>
> 2011/5/10 Måns Rullgård <[email protected]>:
>> "Ronald S. Bultje" <[email protected]> writes:
>>>  #define PREDICT_16x16_DC(v)\
>>>      for(i=0; i<16; i++){\
>>> -        AV_WN4P(src+ 0, v);\
>>> -        AV_WN4P(src+ 4, v);\
>>> -        AV_WN4P(src+ 8, v);\
>>> -        AV_WN4P(src+12, v);\
>>> +        AV_WN4PA(src+ 0, v);\
>>> +        AV_WN4PA(src+ 4, v);\
>>> +        AV_WN4PA(src+ 8, v);\
>>> +        AV_WN4PA(src+12, v);\
>>>          src += stride;\
>>>      }
>>
>> This looks odd.  If src is 4-pixel aligned, it's correct, otherwise not.
>> The A stands for "aligned".
>
> Yes, they are aligned, hence me changing them.


Then it is OK.

>>> @@ -703,9 +718,11 @@ static void FUNCC(pred8x8l_vertical)(uint8_t *_src, 
>>> int has_topleft, int has_top
>>>      src[5] = t5;
>>>      src[6] = t6;
>>>      src[7] = t7;
>>> +    a = AV_RN4PA(((pixel4*)src)+0);
>>> +    b = AV_RN4PA(((pixel4*)src)+1);
>>>      for( y = 1; y < 8; y++ ) {
>>> -        ((pixel4*)(src+y*stride))[0] = ((pixel4*)src)[0];
>>> -        ((pixel4*)(src+y*stride))[1] = ((pixel4*)src)[1];
>>> +        AV_WN4PA(((pixel4*)(src+y*stride))+0, a);
>>> +        AV_WN4PA(((pixel4*)(src+y*stride))+1, b);
>>
>> AV_COPY* might be used here.
>
> I was wondering if that created optimal asm output. I'll test later
> tonight. I was worried that it'd re-read the source pixels for each
> loop iteration.

Colour me stupid.  Or drunk.

-- 
Måns Rullgård
[email protected]
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] fix fate failures for 10bit H264 on some systems

Reply via email to