Re: [FFmpeg-devel] [PATCH] swscale: add unscaled copy from yuv420p10 to p010

Timo Rothenpieler Fri, 02 Sep 2016 02:57:13 -0700

>>>
>>> …or is that really old-school and a modern compiler does all that when 
>>> optimising?
>>>
>>> Or is readability considered more important than marginal gains in 
>>> performance?
>>>
>>> Oliver (time travelling from the 1980s)
>>
>> You would still have to add the remaining stride.
>> The linesize is usually larger than the width, so each line is properly
>> aligned.
>>
>> So with your code, you'd still need something like
>>
>> dstUV += dstStride[1] / 2 - 2 * x;
>> src[2] += srcStride[1] / 2 - x;
>> src[2] += srcStride[1] / 2 - x;
>>
>> after it.
> 
> No, the lines after it remain unchanged - only the temporary variables are 
> looping along the x.
> 
> src[1] += srcStride[1] / 2;
> src[2] += srcStride[2] / 2;
> dstUV += dstStride[1] / 2;



It is indeed very slightly faster.

Old:
[bench @ 0x2cbfb20] t:0.006181 avg:0.006270 max:0.013702 min:0.006080
New:
[bench @ 0x33bcb20] t:0.006195 avg:0.006225 max:0.013718 min:0.006060

It seems to be 0.5ms faster on average.
_______________________________________________
ffmpeg-devel mailing list
[email protected]
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] swscale: add unscaled copy from yuv420p10 to p010

Reply via email to