"Ronald S. Bultje" <[email protected]> writes:
> Hi,
>
> 2011/5/10 Måns Rullgård <[email protected]>:
>> "Ronald S. Bultje" <[email protected]> writes:
>>> #define PREDICT_16x16_DC(v)\
>>> for(i=0; i<16; i++){\
>>> - AV_WN4P(src+ 0, v);\
>>> - AV_WN4P(src+ 4, v);\
>>> - AV_WN4P(src+ 8, v);\
>>> - AV_WN4P(src+12, v);\
>>> + AV_WN4PA(src+ 0, v);\
>>> + AV_WN4PA(src+ 4, v);\
>>> + AV_WN4PA(src+ 8, v);\
>>> + AV_WN4PA(src+12, v);\
>>> src += stride;\
>>> }
>>
>> This looks odd. If src is 4-pixel aligned, it's correct, otherwise not.
>> The A stands for "aligned".
>
> Yes, they are aligned, hence me changing them.
Then it is OK.
>>> @@ -703,9 +718,11 @@ static void FUNCC(pred8x8l_vertical)(uint8_t *_src,
>>> int has_topleft, int has_top
>>> src[5] = t5;
>>> src[6] = t6;
>>> src[7] = t7;
>>> + a = AV_RN4PA(((pixel4*)src)+0);
>>> + b = AV_RN4PA(((pixel4*)src)+1);
>>> for( y = 1; y < 8; y++ ) {
>>> - ((pixel4*)(src+y*stride))[0] = ((pixel4*)src)[0];
>>> - ((pixel4*)(src+y*stride))[1] = ((pixel4*)src)[1];
>>> + AV_WN4PA(((pixel4*)(src+y*stride))+0, a);
>>> + AV_WN4PA(((pixel4*)(src+y*stride))+1, b);
>>
>> AV_COPY* might be used here.
>
> I was wondering if that created optimal asm output. I'll test later
> tonight. I was worried that it'd re-read the source pixels for each
> loop iteration.
Colour me stupid. Or drunk.
--
Måns Rullgård
[email protected]
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel