Re: [libav-devel] [PATCH] H.264: Add x86 assembly for 10-bit H.264 predict functions

Daniel Kang Thu, 07 Jul 2011 14:24:59 -0700

On Wed, Jul 6, 2011 at 12:40 PM, Ronald S. Bultje <[email protected]>wrote:


> Hi,
>
> On Mon, Jul 4, 2011 at 10:09 AM, Daniel Kang <[email protected]>
> wrote:
> > +INIT_MMX
> > +PRED8x8_DC mmxext
> > +PRED8x8_DC sse2
>
> You seem to do this "keep in mmx regs for sse2" for all DCs. Why? It
> seems you can save several instructions, plus the overhead of moving
> from mmx to xmm, by doing everything in xmm registers...
>

Fixed.


> > +    lea         r2, [r1+r1*2]
>
> If you do this one a little more up (where you do the vertical DC),
> you can use it instead of having to do lea r0, [r0+r1*2].
>

I'm not seeing it. If I do that, I need another register.

+cglobal pred8x8_dc_10_%1, 2,4
> [..]
> +    mov         r0, r4
>
> That seems wrong, you're declaring to use 4 registers, but use r4
> also. I think on x86-64, you can use r10/r11, and on x86-32, you can
> prevent the mov and just restore the value from r0m, if you want.
>

Fixed.

> +INIT_MMX
> > +PRED8x8_TOP_DC mmxext
> > +PRED8x8_TOP_DC sse2
>
> Same here as above (mmx->xmm moves).
>

Fixed.

>
> +;-----------------------------------------------------------------------------
> > +;void pred8x8l_dc(pixel *src, int has_topleft, int has_topright, int
> stride)
> >
> +;-----------------------------------------------------------------------------
> > +%macro PRED8x8L_DC 1
> > +cglobal pred8x8l_dc_10_%1, 4,5,8
> > +    sub         r0, r3
> > +    lea         r4, [r0+r3*2]
> > +    mova        m0, [r0+r3*1-16]
> > +    punpckhwd   m0, [r0+r3*0-16]
>
> When I measured, SIMD-vertical-DC was never faster than doing this
> part in scalar, as you do in horizontal. Did you measure this and
> compare it to a scalar implementation for vertical-DC?
>

Left as a TODO.


> > +%if mmsize==16
> > +    mova  m0, [r0+ 0]
> > +    mova  m1, [r0+16]
> > +%else
> > +    movq  m0, [r0+ 0]
> > +    movq  m1, [r0+ 8]
> > +    movq  m2, [r0+16]
> > +    movq  m3, [r0+24]
> > +%endif
>
> mova m0, [r0+0]
> mova m1, [r0+mmsize]
> %if mmsize==8
> mova m2, [r0+16]
> mova m3, [r0+24]
> %endif
>

Fixed.

>
> +;-----------------------------------------------------------------------------
> > +; void pred16x16_horizontal(pixel *src, int stride)
> >
> +;-----------------------------------------------------------------------------
> > +%macro PRED16x16_HORIZONTAL 1
> > +cglobal pred16x16_horizontal_10_%1, 2,3
> > +    sub    r0, r1
> [..]
> > +    movd   m0, [r0+r1*1-4]
> > +    movd   m1, [r0+r1*2-4]
>
> Why the sub?
>

Habit; fixed.

patch.diff
Description: Binary data

_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] H.264: Add x86 assembly for 10-bit H.264 predict functions

Reply via email to