On Sat, Dec 23, 2017 at 03:35:28PM -0300, James Almer wrote: > On 12/23/2017 3:01 PM, Aurelien Jacobs wrote: > > This was originally based on libsbc, and was fully integrated into ffmpeg. > > > > Rough speed test: > > C version: speed= 592x > > MMX version: speed= 785x > > --- > > libavcodec/sbcdsp.c | 3 + > > libavcodec/sbcdsp.h | 2 + > > libavcodec/x86/Makefile | 2 + > > libavcodec/x86/sbcdsp.asm | 284 > > +++++++++++++++++++++++++++++++++++++++++++ > > libavcodec/x86/sbcdsp_init.c | 51 ++++++++ > > 5 files changed, 342 insertions(+) > > create mode 100644 libavcodec/x86/sbcdsp.asm > > create mode 100644 libavcodec/x86/sbcdsp_init.c > > [...] > > > +;******************************************************************* > > +;void ff_sbc_calc_scalefactors(int32_t sb_sample_f[16][2][8], > > +; uint32_t scale_factor[2][8], > > +; int blocks, int channels, int subbands) > > +;******************************************************************* > > +INIT_MMX mmx > > +cglobal sbc_calc_scalefactors, 5, 7, 3, sb_sample_f, scale_factor, blocks, > > channels, subbands, ptr, blk > > + ; subbands = 4 * subbands * channels > > + shl subbandsd, 2 > > + cmp channelsd, 2 > > + jl .loop_1 > > + shl subbandsd, 1 > > + > > +.loop_1: > > + sub subbandsq, 8 > > + lea ptrq, [sb_sample_fq + subbandsq] > > + > > + ; blk = (blocks - 1) * 64; > > + lea blkq, [blocksq - 1] > > + shl blkd, 6 > > + > > + movq m0, [scale_mask] > > I insist, this can be easily loaded outside the loop. You have enough > spare regs to store a copy.
Oh, I forgot to reply to this. There isn't any register left available on x86_32, hence why I kept those load inside the loop. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel