On Wed, Jul 6, 2011 at 12:40 PM, Ronald S. Bultje <[email protected]>wrote:
> Hi, > > On Mon, Jul 4, 2011 at 10:09 AM, Daniel Kang <[email protected]> > wrote: > > +INIT_MMX > > +PRED8x8_DC mmxext > > +PRED8x8_DC sse2 > > You seem to do this "keep in mmx regs for sse2" for all DCs. Why? It > seems you can save several instructions, plus the overhead of moving > from mmx to xmm, by doing everything in xmm registers... > Fixed. > > + lea r2, [r1+r1*2] > > If you do this one a little more up (where you do the vertical DC), > you can use it instead of having to do lea r0, [r0+r1*2]. > I'm not seeing it. If I do that, I need another register. +cglobal pred8x8_dc_10_%1, 2,4 > [..] > + mov r0, r4 > > That seems wrong, you're declaring to use 4 registers, but use r4 > also. I think on x86-64, you can use r10/r11, and on x86-32, you can > prevent the mov and just restore the value from r0m, if you want. > Fixed. > +INIT_MMX > > +PRED8x8_TOP_DC mmxext > > +PRED8x8_TOP_DC sse2 > > Same here as above (mmx->xmm moves). > Fixed. > > +;----------------------------------------------------------------------------- > > +;void pred8x8l_dc(pixel *src, int has_topleft, int has_topright, int > stride) > > > +;----------------------------------------------------------------------------- > > +%macro PRED8x8L_DC 1 > > +cglobal pred8x8l_dc_10_%1, 4,5,8 > > + sub r0, r3 > > + lea r4, [r0+r3*2] > > + mova m0, [r0+r3*1-16] > > + punpckhwd m0, [r0+r3*0-16] > > When I measured, SIMD-vertical-DC was never faster than doing this > part in scalar, as you do in horizontal. Did you measure this and > compare it to a scalar implementation for vertical-DC? > Left as a TODO. > > +%if mmsize==16 > > + mova m0, [r0+ 0] > > + mova m1, [r0+16] > > +%else > > + movq m0, [r0+ 0] > > + movq m1, [r0+ 8] > > + movq m2, [r0+16] > > + movq m3, [r0+24] > > +%endif > > mova m0, [r0+0] > mova m1, [r0+mmsize] > %if mmsize==8 > mova m2, [r0+16] > mova m3, [r0+24] > %endif > Fixed. > > +;----------------------------------------------------------------------------- > > +; void pred16x16_horizontal(pixel *src, int stride) > > > +;----------------------------------------------------------------------------- > > +%macro PRED16x16_HORIZONTAL 1 > > +cglobal pred16x16_horizontal_10_%1, 2,3 > > + sub r0, r1 > [..] > > + movd m0, [r0+r1*1-4] > > + movd m1, [r0+r1*2-4] > > Why the sub? > Habit; fixed.
patch.diff
Description: Binary data
_______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
