Re: [libav-devel] [PATCH] fix fate failures for 10bit H264 on some systems

Ronald S. Bultje Tue, 10 May 2011 17:08:07 -0700

Hi,

2011/5/10 Måns Rullgård <[email protected]>:
> "Ronald S. Bultje" <[email protected]> writes:
>>  #define PREDICT_16x16_DC(v)\
>>      for(i=0; i<16; i++){\
>> -        AV_WN4P(src+ 0, v);\
>> -        AV_WN4P(src+ 4, v);\
>> -        AV_WN4P(src+ 8, v);\
>> -        AV_WN4P(src+12, v);\
>> +        AV_WN4PA(src+ 0, v);\
>> +        AV_WN4PA(src+ 4, v);\
>> +        AV_WN4PA(src+ 8, v);\
>> +        AV_WN4PA(src+12, v);\
>>          src += stride;\
>>      }
>
> This looks odd.  If src is 4-pixel aligned, it's correct, otherwise not.
> The A stands for "aligned".


Yes, they are aligned, hence me changing them.

>> @@ -703,9 +718,11 @@ static void FUNCC(pred8x8l_vertical)(uint8_t *_src, int 
>> has_topleft, int has_top
>>      src[5] = t5;
>>      src[6] = t6;
>>      src[7] = t7;
>> +    a = AV_RN4PA(((pixel4*)src)+0);
>> +    b = AV_RN4PA(((pixel4*)src)+1);
>>      for( y = 1; y < 8; y++ ) {
>> -        ((pixel4*)(src+y*stride))[0] = ((pixel4*)src)[0];
>> -        ((pixel4*)(src+y*stride))[1] = ((pixel4*)src)[1];
>> +        AV_WN4PA(((pixel4*)(src+y*stride))+0, a);
>> +        AV_WN4PA(((pixel4*)(src+y*stride))+1, b);
>
> AV_COPY* might be used here.

I was wondering if that created optimal asm output. I'll test later
tonight. I was worried that it'd re-read the source pixels for each
loop iteration.

Ronald
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] fix fate failures for 10bit H264 on some systems

Reply via email to