Hi,
2011/5/10 Måns Rullgård <[email protected]>:
> "Ronald S. Bultje" <[email protected]> writes:
>> #define PREDICT_16x16_DC(v)\
>> for(i=0; i<16; i++){\
>> - AV_WN4P(src+ 0, v);\
>> - AV_WN4P(src+ 4, v);\
>> - AV_WN4P(src+ 8, v);\
>> - AV_WN4P(src+12, v);\
>> + AV_WN4PA(src+ 0, v);\
>> + AV_WN4PA(src+ 4, v);\
>> + AV_WN4PA(src+ 8, v);\
>> + AV_WN4PA(src+12, v);\
>> src += stride;\
>> }
>
> This looks odd. If src is 4-pixel aligned, it's correct, otherwise not.
> The A stands for "aligned".
Yes, they are aligned, hence me changing them.
>> @@ -703,9 +718,11 @@ static void FUNCC(pred8x8l_vertical)(uint8_t *_src, int
>> has_topleft, int has_top
>> src[5] = t5;
>> src[6] = t6;
>> src[7] = t7;
>> + a = AV_RN4PA(((pixel4*)src)+0);
>> + b = AV_RN4PA(((pixel4*)src)+1);
>> for( y = 1; y < 8; y++ ) {
>> - ((pixel4*)(src+y*stride))[0] = ((pixel4*)src)[0];
>> - ((pixel4*)(src+y*stride))[1] = ((pixel4*)src)[1];
>> + AV_WN4PA(((pixel4*)(src+y*stride))+0, a);
>> + AV_WN4PA(((pixel4*)(src+y*stride))+1, b);
>
> AV_COPY* might be used here.
I was wondering if that created optimal asm output. I'll test later
tonight. I was worried that it'd re-read the source pixels for each
loop iteration.
Ronald
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel