Quoting Henrik Gramner (2016-09-05 15:15:14) > On Mon, Sep 5, 2016 at 1:02 PM, Anton Khirnov <[email protected]> wrote: > > +cglobal vector_clipf, 3, 3, 6, dst, src, len, min, max > > +%if ARCH_X86_32 > > + VBROADCASTSS m0, minm > > + VBROADCASTSS m1, maxm > > +%else > > + VBROADCASTSS m0, m0 > > + VBROADCASTSS m1, m1 > > +%endif > > This will fail on WIN64, to deal with the somewhat silly calling > conventions on that platform you need to do something like > VBROADCASTSS m0, m3 > VBROADCASTSS m1, maxm > (not tested, I don't have access to a Windows machine at the moment). > > > + movsxdifnidn lenq, lend > > + shl lenq, 2 > > + > > +.loop > > + sub lenq, 4 * mmsize > > Move the subtraction just before the branch (jg) to allow macro-op > fusion on modern Intel CPUs. > > > + > > + mova m2, [srcq + lenq + 0 * mmsize] > > + mova m3, [srcq + lenq + 1 * mmsize] > > + mova m4, [srcq + lenq + 2 * mmsize] > > + mova m5, [srcq + lenq + 3 * mmsize] > > + > > + maxps m2, m0 > > + maxps m3, m0 > > + maxps m4, m0 > > + maxps m5, m0 > > Use 3-arg maxps instead of mova.
Isn't that AVX-only? > > > + minps m2, m1 > > + minps m3, m1 > > + minps m4, m1 > > + minps m5, m1 > > + > > + mova [dstq + lenq + 0 * mmsize], m2 > > + mova [dstq + lenq + 1 * mmsize], m3 > > + mova [dstq + lenq + 2 * mmsize], m4 > > + mova [dstq + lenq + 3 * mmsize], m5 > > + > > + jg .loop > > + > > + RET > > Otherwise LGTM, you could make an AVX version using ymm registers as > well in a separate patch if you want to, just need to make sure the > buffers are aligned. This function is only used in two rather obscure places, so probably not worth it -- Anton Khirnov _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
