On Wed, Oct 12, 2011 at 03:46:08PM +0200, Janne Grunau wrote:
> On Wed, Oct 05, 2011 at 07:03:52AM +0200, Kostya Shishkov wrote:
> > On Tue, Oct 04, 2011 at 10:32:16PM +0200, Janne Grunau wrote:
> > > +void ff_rv30dsp_init_neon(RV34DSPContext *c, DSPContext* dsp)
> > > +{
> > > +    c->rv34_inv_transform_tab[0] = ff_rv34_inv_transform_neon;
> > > +    c->rv34_inv_transform_tab[1] = ff_rv34_inv_transform_noround_neon;
> > > +
> > > +    return;
> >
> > this return is silly
> 
> removed
> 
> > > diff --git a/libavcodec/arm/rv34dsp_neon.S b/libavcodec/arm/rv34dsp_neon.S
> > > new file mode 100644
> > > index 0000000..9414db2
> > > --- /dev/null
> > > +++ b/libavcodec/arm/rv34dsp_neon.S
> >
> > Also wouldn't it be faster to use the oldest trick in the book for some
> > multiplications (e.g. X*7 = (X << 3) - X, etc.)?
> 
> It is for some. Converting the multiplications by 7 and 17 to shift+add
> made the functions more than 10% faster. The final multiplication in the
> noround version got slower for reasons I don't understand.
> 
> Janne
> ---8<--
> 4.3 times faster, more than 5% overall speedup on bourne.rvmb
> 
> 3336 dezicycles in rv34_inv_transform,         8387851 runs, 757 skips
>  767 dezicycles in ff_rv34_inv_transform_neon, 8388447 runs, 161 skips
> 
> 3871 dezicycles in rv34_inv_transform_noround,         261884 runs, 260 skips
>  889 dezicycles in ff_rv34_inv_transform_noround_neon, 262135 runs,   9 skips
> ---
>  libavcodec/arm/Makefile            |    4 +
>  libavcodec/arm/rv34dsp_init_neon.c |   34 +++++++++++
>  libavcodec/arm/rv34dsp_neon.S      |  116 
> ++++++++++++++++++++++++++++++++++++
>  libavcodec/rv34dsp.c               |    3 +
>  libavcodec/rv34dsp.h               |    2 +
>  5 files changed, 159 insertions(+), 0 deletions(-)
>  create mode 100644 libavcodec/arm/rv34dsp_init_neon.c
>  create mode 100644 libavcodec/arm/rv34dsp_neon.S
>
[...] 
> diff --git a/libavcodec/arm/rv34dsp_neon.S b/libavcodec/arm/rv34dsp_neon.S
> new file mode 100644
> index 0000000..6f5c17d
> --- /dev/null
> +++ b/libavcodec/arm/rv34dsp_neon.S
> @@ -0,0 +1,116 @@
> +/*
[...]
> +        vst4.16         {d0[1], d1[1], d2[1], d3[1]}, [r2], r1
> +        vst4.16         {d0[2], d1[2], d2[2], d3[2]}, [r2], r1
> +        vst4.16         {d0[3], d1[3], d2[3], d3[3]}, [r2], r1
> +        bx              lr
> +endfunc
> \ No newline at end of file

nit: maybe add a newline?

The patch looks OK though.
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to