On Wed, 16 Jul 2014, Ben Avison wrote:

The previous implementation targeted DTS Coherent Acoustics, which only
requires nbits == 4 (fft16()). This case was (and still is) linked directly
rather than being indirected through ff_fft_calc_vfp(), but now the full
range from radix-4 up to radix-65536 is available. This benefits other codecs
such as AAC and AC3.

The implementaion is based upon the C version, with each routine larger than
radix-16 calling a hierarchy of smaller FFT functions, then performing a
post-processing pass. This pass benefits a lot from loop unrolling to
counter the long pipelines in the VFP. A relaxed calling standard also
reduces the overhead of the call hierarchy, and avoiding the excessive
inlining performed by GCC probably helps with I-cache utilisation too.

I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in the FFT routines (fft4() to fft512() and pass()) for the
same sample AAC stream:

             Before          After
             Mean   StdDev   Mean   StdDev  Confidence  Change
Audio decode  2245.5 53.1     1599.6 43.8    100.0%      +40.4%
FFT routines  940.6  22.0     348.1  20.8    100.0%      +170.2%
---
libavcodec/arm/fft_init_arm.c |    8 +-
libavcodec/arm/fft_vfp.S      |  278 +++++++++++++++++++++++++++++++++++++----
2 files changed, 261 insertions(+), 25 deletions(-)

Thanks, this patch seems to work in all my weird build configurations. The patch also looks good enough to me otherwise, with or without the .L prefix removed (Ben, which way do you prefer it?).

// Martin
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to