On Mon, Sep 5, 2016 at 1:02 PM, Anton Khirnov <[email protected]> wrote:
> +cglobal vector_clipf, 3, 3, 6, dst, src, len, min, max
> +%if ARCH_X86_32
> + VBROADCASTSS m0, minm
> + VBROADCASTSS m1, maxm
> +%else
> + VBROADCASTSS m0, m0
> + VBROADCASTSS m1, m1
> +%endif
This will fail on WIN64, to deal with the somewhat silly calling
conventions on that platform you need to do something like
VBROADCASTSS m0, m3
VBROADCASTSS m1, maxm
(not tested, I don't have access to a Windows machine at the moment).
> + movsxdifnidn lenq, lend
> + shl lenq, 2
> +
> +.loop
> + sub lenq, 4 * mmsize
Move the subtraction just before the branch (jg) to allow macro-op
fusion on modern Intel CPUs.
> +
> + mova m2, [srcq + lenq + 0 * mmsize]
> + mova m3, [srcq + lenq + 1 * mmsize]
> + mova m4, [srcq + lenq + 2 * mmsize]
> + mova m5, [srcq + lenq + 3 * mmsize]
> +
> + maxps m2, m0
> + maxps m3, m0
> + maxps m4, m0
> + maxps m5, m0
Use 3-arg maxps instead of mova.
> + minps m2, m1
> + minps m3, m1
> + minps m4, m1
> + minps m5, m1
> +
> + mova [dstq + lenq + 0 * mmsize], m2
> + mova [dstq + lenq + 1 * mmsize], m3
> + mova [dstq + lenq + 2 * mmsize], m4
> + mova [dstq + lenq + 3 * mmsize], m5
> +
> + jg .loop
> +
> + RET
Otherwise LGTM, you could make an AVX version using ymm registers as
well in a separate patch if you want to, just need to make sure the
buffers are aligned.
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel