On 2017-11-27 00:17, Rostislav Pehlivanov wrote: > On 26 November 2017 at 22:51, James Darnley <james.darn...@gmail.com> wrote: >> @@ -152,13 +152,13 @@ RET >> %macro FUNCTION_BODY_32 0 >> >> %if ARCH_X86_64 >> - cglobal flac_enc_lpc_32, 5, 7, 8, mmsize, res, smp, len, order, coefs >> + cglobal flac_enc_lpc_32, 5, 7, 8, mmsize*4, res, smp, len, order, >> coefs >> > > Why x4, shouldn't this be x2?
I write 3 mm registers more to the stack. The first one is the sign extension for my hacked qword arithmetic shift added in the first 32-bit patch. The new 3 are to store the "odd" values created in the first inner loop. I admit that this is a rather ugly construction for a little speed gain but I think I've seen other ugly things since writing this. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel