> > +function ff_g722_apply_qmf_neon, export=1, align=4
> > + movrel r3, qmf_coeffs
> > + vld1.s16 {d2,d3,d4}, [r0]! /* load prev_samples */
>
> The input is not guaranteed to be aligned?
>
> > + vld1.s16 {d16,d17,d18}, [r3,:64]! /* load qmf_coeffs */
>
> it looks a little bit odd to load 2 times 3 64-bit registers. If you
> were to load 3 times 2 64-bit registers (or first 4 64-bit registers and
> then 2) you could use the 16-byte alignemt of the constants.
>
> > + vmull.s16 q0, d2, d16
> > + vmlal.s16 q0, d3, d17
>
> it might be faster to accumumate in two registers and add the results at
> the end.
will try your suggestions, thanks!
> > + vmlal.s16 q0, d4, d18
> > +
> > + vld1.s16 {d5,d6,d7}, [r0]! /* load prev_samples */
> > + vld1.s16 {d19,d20,d21}, [r3,:64]! /* load qmf_coeffs */
> > + vmlal.s16 q0, d5, d19
> > + vmlal.s16 q0, d6, d20
> > + vmlal.s16 q0, d7, d21
> > +
> > + vadd.s32 d0, d1, d0
> > + vrev64.32 d0, d0
> > + vst1.s32 {d0}, [r1]
>
> no alignment? it might be faster then to avoid the vrev64 and store each
> s32 individually.
--
Peter Meerwald
+43-664-2444418 (mobile)
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel