> Benefit would be to save 2 branching and 2 div (the div can already be > replaced by a shift here though - which would have a benefit since x & y > are signed) and keep the code generic.
Changed it to a shift and moved the if() out of the loop. No observable performance benefit though. >>>> + frame->data[3][frame->linesize[3] * y + x] = >>>> do_chromakey_pixel(ctx, u, v); >>> >>> You might want to check if saving a bunch of dereferencing in the inner >>> loop helps performance. >> >> You mean getting frame->data[3] and frame->linesize[3] before the loop? > > yes > >> Shouldn't this be something the compiler optimises for me? > > it should, but i've observe performance enhancement in similar situations. > You might want to try. Did some testing. My most aggressively hand-optimized version was significantly slower than just letting gcc optimize the original code. Looking at the assembly, it even seems to automatically optimize out the multiplications by y, but the assembly is quite complex and I'm not sure if I'm reading it right. None of the lighter changes made any difference in speed. I guess the greatest possible speedups could be made by converting the actual chromakey algorithm to integer math, which is something i plan on doing, but i'd like to get a working version in first.
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel