On 9/21/15, Timo Rothenpieler <t...@rothenpieler.org> wrote: >> Benefit would be to save 2 branching and 2 div (the div can already be >> replaced by a shift here though - which would have a benefit since x & y >> are signed) and keep the code generic. > > Changed it to a shift and moved the if() out of the loop. > No observable performance benefit though. > >>>>> + frame->data[3][frame->linesize[3] * y + x] = >>>>> do_chromakey_pixel(ctx, u, v); >>>> >>>> You might want to check if saving a bunch of dereferencing in the inner >>>> loop helps performance. >>> >>> You mean getting frame->data[3] and frame->linesize[3] before the loop? >> >> yes >> >>> Shouldn't this be something the compiler optimises for me? >> >> it should, but i've observe performance enhancement in similar >> situations. >> You might want to try. > > Did some testing. My most aggressively hand-optimized version was > significantly slower than just letting gcc optimize the original code. > Looking at the assembly, it even seems to automatically optimize out the > multiplications by y, but the assembly is quite complex and I'm not sure > if I'm reading it right. > None of the lighter changes made any difference in speed. > > I guess the greatest possible speedups could be made by converting the > actual chromakey algorithm to integer math, which is something i plan on > doing, but i'd like to get a working version in first. > >
still lgtm _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel