Re: [libav-devel] [PATCH v2 5/5] g722: Add ARM NEON implementation for g722_apply_qmf()

Peter Meerwald Mon, 02 Feb 2015 05:49:57 -0800

> > +function ff_g722_apply_qmf_neon, export=1, align=4
> > +        movrel          r3, qmf_coeffs
> > +        vld1.s16        {d2,d3,d4}, [r0]! /* load prev_samples */
> 
> The input is not guaranteed to be aligned?
> 
> > +        vld1.s16        {d16,d17,d18}, [r3,:64]! /* load qmf_coeffs */
> 
> it looks a little bit odd to load 2 times 3 64-bit registers. If you 
> were to load 3 times 2 64-bit registers (or first 4 64-bit registers and 
> then 2) you could use the 16-byte alignemt of the constants.
> 
> > +        vmull.s16       q0, d2, d16
> > +        vmlal.s16       q0, d3, d17
> 
> it might be faster to accumumate in two registers and add the results at 
> the end.


will try your suggestions, thanks!
 
> > +        vmlal.s16       q0, d4, d18
> > +
> > +        vld1.s16        {d5,d6,d7}, [r0]! /* load prev_samples */
> > +        vld1.s16        {d19,d20,d21}, [r3,:64]! /* load qmf_coeffs */
> > +        vmlal.s16       q0, d5, d19
> > +        vmlal.s16       q0, d6, d20
> > +        vmlal.s16       q0, d7, d21
> > +
> > +        vadd.s32        d0, d1, d0
> > +        vrev64.32       d0, d0
> > +        vst1.s32        {d0}, [r1]
> 
> no alignment? it might be faster then to avoid the vrev64 and store each
> s32 individually.

-- 

Peter Meerwald
+43-664-2444418 (mobile)
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH v2 5/5] g722: Add ARM NEON implementation for g722_apply_qmf()

Reply via email to