On 9/21/15, Timo Rothenpieler <t...@rothenpieler.org> wrote:
>> Benefit would be to save 2 branching and 2 div (the div can already be
>> replaced by a shift here though - which would have a benefit since x & y
>> are signed) and keep the code generic.
>
> Changed it to a shift and moved the if() out of the loop.
> No observable performance benefit though.
>
>>>>> +            frame->data[3][frame->linesize[3] * y + x] =
>>>>> do_chromakey_pixel(ctx, u, v);
>>>>
>>>> You might want to check if saving a bunch of dereferencing in the inner
>>>> loop helps performance.
>>>
>>> You mean getting frame->data[3] and frame->linesize[3] before the loop?
>>
>> yes
>>
>>> Shouldn't this be something the compiler optimises for me?
>>
>> it should, but i've observe performance enhancement in similar
>> situations.
>> You might want to try.
>
> Did some testing. My most aggressively hand-optimized version was
> significantly slower than just letting gcc optimize the original code.
> Looking at the assembly, it even seems to automatically optimize out the
> multiplications by y, but the assembly is quite complex and I'm not sure
> if I'm reading it right.
> None of the lighter changes made any difference in speed.
>
> I guess the greatest possible speedups could be made by converting the
> actual chromakey algorithm to integer math, which is something i plan on
> doing, but i'd like to get a working version in first.
>
>

still lgtm
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to