On 24 February 2018 at 12:05, Aurelien Jacobs <au...@gnuage.org> wrote:
> On Thu, Feb 22, 2018 at 05:21:57PM +0000, Rostislav Pehlivanov wrote: > > On 21 February 2018 at 22:37, Aurelien Jacobs <au...@gnuage.org> wrote: > > [...] > > > +;******************************************************************* > > > +;void ff_sbc_analyze_4(const int16_t *in, int32_t *out, const int16_t > > > *consts); > > > +;******************************************************************* > > > +INIT_MMX mmx > > > +cglobal sbc_analyze_4, 3, 3, 4, in, out, consts > > > + movq m0, [inq] > > > + movq m1, [inq+8] > > > + pmaddwd m0, [constsq] > > > + pmaddwd m1, [constsq+8] > > > + paddd m0, [scale_mask] > > > + paddd m1, [scale_mask] > > > + > > > + movq m2, [inq+16] > > > + movq m3, [inq+24] > > > + pmaddwd m2, [constsq+16] > > > + pmaddwd m3, [constsq+24] > > > + paddd m0, m2 > > > + paddd m1, m3 > > > + > > > + movq m2, [inq+32] > > > + movq m3, [inq+40] > > > + pmaddwd m2, [constsq+32] > > > + pmaddwd m3, [constsq+40] > > > + paddd m0, m2 > > > + paddd m1, m3 > > > + > > > + movq m2, [inq+48] > > > + movq m3, [inq+56] > > > + pmaddwd m2, [constsq+48] > > > + pmaddwd m3, [constsq+56] > > > + paddd m0, m2 > > > + paddd m1, m3 > > > + > > > + movq m2, [inq+64] > > > + movq m3, [inq+72] > > > + pmaddwd m2, [constsq+64] > > > + pmaddwd m3, [constsq+72] > > > + paddd m0, m2 > > > + paddd m1, m3 > > > > > > > You can macro the top 3 blocks > > > > [...] > > > +;******************************************************************* > > > +;void ff_sbc_analyze_8(const int16_t *in, int32_t *out, const int16_t > > > *consts); > > > +;******************************************************************* > > > +INIT_MMX mmx > > > +cglobal sbc_analyze_8, 3, 3, 4, in, out, consts > > > + movq m0, [inq] > > > + movq m1, [inq+8] > > > + movq m2, [inq+16] > > > + movq m3, [inq+24] > > > + pmaddwd m0, [constsq] > > > + pmaddwd m1, [constsq+8] > > > + pmaddwd m2, [constsq+16] > > > + pmaddwd m3, [constsq+24] > > > + paddd m0, [scale_mask] > > > + paddd m1, [scale_mask] > > > + paddd m2, [scale_mask] > > > + paddd m3, [scale_mask] > > > + > > > + movq m4, [inq+32] > > > + movq m5, [inq+40] > > > + movq m6, [inq+48] > > > + movq m7, [inq+56] > > > + pmaddwd m4, [constsq+32] > > > + pmaddwd m5, [constsq+40] > > > + pmaddwd m6, [constsq+48] > > > + pmaddwd m7, [constsq+56] > > > + paddd m0, m4 > > > + paddd m1, m5 > > > + paddd m2, m6 > > > + paddd m3, m7 > > > + > > > + movq m4, [inq+64] > > > + movq m5, [inq+72] > > > + movq m6, [inq+80] > > > + movq m7, [inq+88] > > > + pmaddwd m4, [constsq+64] > > > + pmaddwd m5, [constsq+72] > > > + pmaddwd m6, [constsq+80] > > > + pmaddwd m7, [constsq+88] > > > + paddd m0, m4 > > > + paddd m1, m5 > > > + paddd m2, m6 > > > + paddd m3, m7 > > > + > > > + movq m4, [inq+96] > > > + movq m5, [inq+104] > > > + movq m6, [inq+112] > > > + movq m7, [inq+120] > > > + pmaddwd m4, [constsq+96] > > > + pmaddwd m5, [constsq+104] > > > + pmaddwd m6, [constsq+112] > > > + pmaddwd m7, [constsq+120] > > > + paddd m0, m4 > > > + paddd m1, m5 > > > + paddd m2, m6 > > > + paddd m3, m7 > > > + > > > + movq m4, [inq+128] > > > + movq m5, [inq+136] > > > + movq m6, [inq+144] > > > + movq m7, [inq+152] > > > + pmaddwd m4, [constsq+128] > > > + pmaddwd m5, [constsq+136] > > > + pmaddwd m6, [constsq+144] > > > + pmaddwd m7, [constsq+152] > > > + paddd m0, m4 > > > + paddd m1, m5 > > > + paddd m2, m6 > > > + paddd m3, m7 > > > > > > > And those 5 blocks > > > > > > > + > > > + psrad m0, 16 ; SBC_PROTO_FIXED_SCALE > > > + psrad m1, 16 ; SBC_PROTO_FIXED_SCALE > > > + psrad m2, 16 ; SBC_PROTO_FIXED_SCALE > > > + psrad m3, 16 ; SBC_PROTO_FIXED_SCALE > > > + > > > + packssdw m0, m0 > > > + packssdw m1, m1 > > > + packssdw m2, m2 > > > + packssdw m3, m3 > > > + > > > + movq m4, m0 > > > + movq m5, m0 > > > + pmaddwd m4, [constsq+160] > > > + pmaddwd m5, [constsq+168] > > > + > > > + movq m6, m1 > > > + movq m7, m1 > > > + pmaddwd m6, [constsq+192] > > > + pmaddwd m7, [constsq+200] > > > + paddd m4, m6 > > > + paddd m5, m7 > > > + > > > + movq m6, m2 > > > + movq m7, m2 > > > + pmaddwd m6, [constsq+224] > > > + pmaddwd m7, [constsq+232] > > > + paddd m4, m6 > > > + paddd m5, m7 > > > + > > > + movq m6, m3 > > > + movq m7, m3 > > > + pmaddwd m6, [constsq+256] > > > + pmaddwd m7, [constsq+264] > > > + paddd m4, m6 > > > + paddd m5, m7 > > > > > > > Reuse the first macro here > > > > Should save quite a bit of code > > OK, here is a "macroified" version of the code. > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > Looks fine to me, but I'd like to get someone else's opinion on this. jamrial / nevcairiel / gramner? _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel