On Wed, Oct 12, 2011 at 03:46:08PM +0200, Janne Grunau wrote:
> On Wed, Oct 05, 2011 at 07:03:52AM +0200, Kostya Shishkov wrote:
> > On Tue, Oct 04, 2011 at 10:32:16PM +0200, Janne Grunau wrote:
> > > +void ff_rv30dsp_init_neon(RV34DSPContext *c, DSPContext* dsp)
> > > +{
> > > + c->rv34_inv_transform_tab[0] = ff_rv34_inv_transform_neon;
> > > + c->rv34_inv_transform_tab[1] = ff_rv34_inv_transform_noround_neon;
> > > +
> > > + return;
> >
> > this return is silly
>
> removed
>
> > > diff --git a/libavcodec/arm/rv34dsp_neon.S b/libavcodec/arm/rv34dsp_neon.S
> > > new file mode 100644
> > > index 0000000..9414db2
> > > --- /dev/null
> > > +++ b/libavcodec/arm/rv34dsp_neon.S
> >
> > Also wouldn't it be faster to use the oldest trick in the book for some
> > multiplications (e.g. X*7 = (X << 3) - X, etc.)?
>
> It is for some. Converting the multiplications by 7 and 17 to shift+add
> made the functions more than 10% faster. The final multiplication in the
> noround version got slower for reasons I don't understand.
>
> Janne
> ---8<--
> 4.3 times faster, more than 5% overall speedup on bourne.rvmb
>
> 3336 dezicycles in rv34_inv_transform, 8387851 runs, 757 skips
> 767 dezicycles in ff_rv34_inv_transform_neon, 8388447 runs, 161 skips
>
> 3871 dezicycles in rv34_inv_transform_noround, 261884 runs, 260 skips
> 889 dezicycles in ff_rv34_inv_transform_noround_neon, 262135 runs, 9 skips
> ---
> libavcodec/arm/Makefile | 4 +
> libavcodec/arm/rv34dsp_init_neon.c | 34 +++++++++++
> libavcodec/arm/rv34dsp_neon.S | 116
> ++++++++++++++++++++++++++++++++++++
> libavcodec/rv34dsp.c | 3 +
> libavcodec/rv34dsp.h | 2 +
> 5 files changed, 159 insertions(+), 0 deletions(-)
> create mode 100644 libavcodec/arm/rv34dsp_init_neon.c
> create mode 100644 libavcodec/arm/rv34dsp_neon.S
>
[...]
> diff --git a/libavcodec/arm/rv34dsp_neon.S b/libavcodec/arm/rv34dsp_neon.S
> new file mode 100644
> index 0000000..6f5c17d
> --- /dev/null
> +++ b/libavcodec/arm/rv34dsp_neon.S
> @@ -0,0 +1,116 @@
> +/*
[...]
> + vst4.16 {d0[1], d1[1], d2[1], d3[1]}, [r2], r1
> + vst4.16 {d0[2], d1[2], d2[2], d3[2]}, [r2], r1
> + vst4.16 {d0[3], d1[3], d2[3], d3[3]}, [r2], r1
> + bx lr
> +endfunc
> \ No newline at end of file
nit: maybe add a newline?
The patch looks OK though.
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel