Jan 13, 2021, 00:55 by d...@lynne.ee: > Jan 12, 2021, 22:24 by andreas.rheinha...@gmail.com: > >> Lynne: >> >>> The AC3 encoder used to be a separate library called "Aften", which >>> got merged into libavcodec (literally, SVN commits and all). >>> The merge preserved as much features from the library as possible. >>> >>> The code had two versions - a fixed point version and a floating >>> point version. FFmpeg had floating point DSP code used by other >>> codecs, the AC3 decoder including, so the floating-point DSP was >>> simply replaced with FFmpeg's own functions. >>> However, FFmpeg had no fixed-point audio code at that point. So >>> the encoder brought along its own fixed-point DSP functions, >>> including a fixed-point MDCT. >>> >>> The fixed-point MDCT itself is trivially just a float MDCT with a >>> different type and each multiply being a fixed-point multiply. >>> So over time, it got refactored, and the FFT used for all other codecs >>> was templated. >>> >>> Due to design decisions at the time, the fixed-point version of the >>> encoder operates at 16-bits of precision. Although convenient, this, >>> even at the time, was inadequate and inefficient. The encoder is noisy, >>> does not produce output comparable to the float encoder, and even >>> rings at higher frequencies due to the badly approximated winow function. >>> >>> Enter MIPS (owned by Imagination Technologies at the time). They wanted >>> quick fixed-point decoding on their FPUless cores. So they contributed >>> patches to template the AC3 decoder so it had both a fixed-point >>> and a floating-point version. They also did the same for the AAC decoder. >>> They however, used 32-bit samples. Not 16-bits. And we did not have >>> 32-bit fixed-point DSP functions, including an MDCT. But instead of >>> templating our MDCT to output 3 versions (float, 32-bit fixed and 16-bit >>> fixed), >>> they simply copy-pasted their own MDCT into ours, and completely >>> ifdeffed our own MDCT code out if a 32-bit fixed point MDCT was selected. >>> >>> This is also the status quo nowadays - 2 separate MDCTs, one which >>> produces floating point and 16-bit fixed point versions, and one >>> sort-of integrated which produces 32-bit MDCT. >>> >>> MIPS weren't all that interested in encoding, so they left the encoder >>> as-is, and they didn't care much about the ifdeffery, mess or quality - it's >>> not their problem. >>> >>> So the MDCT/FFT code has always been a thorn in anyone looking to clean up >>> code's eye. >>> >>> Backstory over. Internally AC3 operates on 25-bit fixed-point coefficients. >>> So for the floating point version, the encoder simply runs the float MDCT, >>> and converts the resulting coefficients to 25-bit fixed-point, as AC3 is >>> inherently >>> a fixed-point codec. For the fixed-point version, the input is 16-bit >>> samples, >>> so to maximize precision the frame samples are analyzed and the highest set >>> bit is detected via ac3_max_msb_abs_int16(), and the coefficients are then >>> scaled up via ac3_lshift_int16(), so the input for the FFT is always at >>> least 14 bits, >>> computed in normalize_samples(). After FFT, the coefficients are scaled up >>> to 25 bits. >>> >>> This patch simply changes the encoder to accept 32-bit samples, reusing >>> the already well-optimized 32-bit MDCT code, allowing us to clean up and >>> drop >>> a large part of a very messy code of ours, as well as prepare for the >>> future lavu/tx >>> conversion. The coefficients are simply scaled down to 25 bits during >>> windowing, >>> skipping 2 separate scalings, as the hacks to extend precision are simply >>> no longer >>> necessary. There's no point in running the MDCT always at 32 bits when >>> you're >>> going to drop 6 bits off anyway, the headroom is plenty, and the MDCT rounds >>> properly. >>> >>> This also makes the encoder even slightly more accurate over the float >>> version, >>> as there's no coefficient conversion step necessary. >>> >>> SIZE SAVINGS: >>> ARM32: >>> HARDCODED TABLES: >>> BASE - 10709590 >>> DROP DSP - 10702872 - diff: -6.56KiB >>> DROP MDCT - 10667932 - diff: -34.12KiB - both: -40.68KiB >>> DROP FFT - 10336652 - diff: -323.52KiB - all: -364.20KiB >>> SOFTCODED TABLES: >>> BASE - 9685096 >>> DROP DSP - 9678378 - diff: -6.56KiB >>> DROP MDCT - 9643466 - diff: -34.09KiB - both: -40.65KiB >>> DROP FFT - 9573918 - diff: -67.92KiB - all: -108.57KiB >>> >>> ARM64: >>> HARDCODED TABLES: >>> BASE - 14641112 >>> DROP DSP - 14633806 - diff: -7.13KiB >>> DROP MDCT - 14604812 - diff: -28.31KiB - both: -35.45KiB >>> DROP FFT - 14286826 - diff: -310.53KiB - all: -345.98KiB >>> SOFTCODED TABLES: >>> BASE - 13636238 >>> DROP DSP - 13628932 - diff: -7.13KiB >>> DROP MDCT - 13599866 - diff: -28.38KiB - both: -35.52KiB >>> DROP FFT - 13542080 - diff: -56.43KiB - all: -91.95KiB >>> >>> x86: >>> HARDCODED TABLES: >>> BASE - 12367336 >>> DROP DSP - 12354698 - diff: -12.34KiB >>> DROP MDCT - 12331024 - diff: -23.12KiB - both: -35.46KiB >>> DROP FFT - 12029788 - diff: -294.18KiB - all: -329.64KiB >>> SOFTCODED TABLES: >>> BASE - 11358094 >>> DROP DSP - 11345456 - diff: -12.34KiB >>> DROP MDCT - 11321742 - diff: -23.16KiB - both: -35.50KiB >>> DROP FFT - 11276946 - diff: -43.75KiB - all: -79.25KiB >>> >>> PERFORMANCE (10min random s32le): >>> ARM32 - before - 39.9x - 0m15.046s >>> ARM32 - after - 28.2x - 0m21.525s >>> Speed: -30% >>> >>> ARM64 - before - 36.1x - 0m16.637s >>> ARM64 - after - 36.0x - 0m16.727s >>> Speed: -0.5% >>> >>> x86 - before - 184x - 0m3.277s >>> x86 - after - 190x - 0m3.187s >>> Speed: +3% >>> >>> New patch attached. >>> >> FATE still fails with patches 1 and 2 on patchwork. >> > > Fixed. Forgot a line. New patches attached. >
Pushed. Thanks to all who commented. Next up will likely be the mdct15 removal, which will be around 80KiB on all platforms. Fun fact: commit 4/5 removed MMX code. Not something you see every day. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".