On 10/1/15, Henrik Gramner <hen...@gramner.com> wrote: > On Thu, Oct 1, 2015 at 8:42 PM, Paul B Mahol <one...@gmail.com> wrote: >> diff --git a/libavfilter/vf_maskedmerge.c b/libavfilter/vf_maskedmerge.c > >> if (desc->comp[0].depth == 8) >> s->maskedmerge = maskedmerge8; >> else >> s->maskedmerge = maskedmerge16; >> >> + if (ARCH_X86) >> + ff_maskedmerge_init_x86(s); >> + > > Create a new function ff_maskedmerge_init() and move the above code > there, that will make it easier to add a unit test.
Maybe when me or someone else add test, now I'm just in learning asm stage. > >> diff --git a/libavfilter/x86/vf_maskedmerge.asm >> b/libavfilter/x86/vf_maskedmerge.asm > >> + mova m5, [pw_128] >> + mova m2, [pw_256] >> + pxor m6, m6 > > Nit: Reorganize your registers so you get those constants in m4, m5, > m6. It will make the code easier to follow IMO. Changed locally. > >> + mov r10q, 0 > > Xor a register with itself instead of using mov to zero a register. > There's also no need to use the q suffix for plain register names, r10 > is enough. Changed locally. > >> + movh m0, [bsrcq + x] >> + movh m1, [osrcq + x] >> + movh m3, [msrcq + x] > [...] >> + punpcklbw m0, m6 >> + punpcklbw m1, m6 >> + punpcklbw m3, m6 > > You could also make an SSE4 version that uses pmovzxbw. > >> + paddw m1, m5 >> + psrlw m1, 8 > > I believe you could also make an SSSE3 version that uses pmulhrsw > instead of add + shift. > >> + add r10q, mmsize / 2 >> + cmp r10q, wq >> + jl .loop > > There's a trick you could do here that might be faster: > 1) Add w to bsrc, osrc, msrc and dst and then negate w in the > beginning of the function. > 2) Initialize r10 to w instead of 0 at the beginning of each .nextrow > iteration > 3) You can now drop the cmp, the add will be enough to set the right > flags for the branch Will experiment. > > I also encourage you to write a checkasm unit test, that will make it > easier to both benchmark and verify the correctness of your code. Maybe later. > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel