Quoting Henrik Gramner (2016-09-05 15:15:14)
> On Mon, Sep 5, 2016 at 1:02 PM, Anton Khirnov <[email protected]> wrote:
> > +cglobal vector_clipf, 3, 3, 6, dst, src, len, min, max
> > +%if ARCH_X86_32
> > +    VBROADCASTSS m0, minm
> > +    VBROADCASTSS m1, maxm
> > +%else
> > +    VBROADCASTSS m0, m0
> > +    VBROADCASTSS m1, m1
> > +%endif
> 
> This will fail on WIN64, to deal with the somewhat silly calling
> conventions on that platform you need to do something like
>     VBROADCASTSS m0, m3
>     VBROADCASTSS m1, maxm
> (not tested, I don't have access to a Windows machine at the moment).
> 
> > +    movsxdifnidn lenq, lend
> > +    shl lenq, 2
> > +
> > +.loop
> > +    sub lenq, 4 * mmsize
> 
> Move the subtraction just before the branch (jg) to allow macro-op
> fusion on modern Intel CPUs.
> 
> > +
> > +    mova m2, [srcq + lenq + 0 * mmsize]
> > +    mova m3, [srcq + lenq + 1 * mmsize]
> > +    mova m4, [srcq + lenq + 2 * mmsize]
> > +    mova m5, [srcq + lenq + 3 * mmsize]
> > +
> > +    maxps m2, m0
> > +    maxps m3, m0
> > +    maxps m4, m0
> > +    maxps m5, m0
> 
> Use 3-arg maxps instead of mova.

Isn't that AVX-only?

> 
> > +    minps m2, m1
> > +    minps m3, m1
> > +    minps m4, m1
> > +    minps m5, m1
> > +
> > +    mova [dstq + lenq + 0 * mmsize], m2
> > +    mova [dstq + lenq + 1 * mmsize], m3
> > +    mova [dstq + lenq + 2 * mmsize], m4
> > +    mova [dstq + lenq + 3 * mmsize], m5
> > +
> > +    jg .loop
> > +
> > +    RET
> 
> Otherwise LGTM, you could make an AVX version using ymm registers as
> well in a separate patch if you want to, just need to make sure the
> buffers are aligned.

This function is only used in two rather obscure places, so probably not
worth it

-- 
Anton Khirnov
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to