On Fri, 11 Jul 2014, Ben Avison wrote:

The previous implementation targeted DTS Coherent Acoustics, which only
requires mdct_bits == 6. This relatively small size lent itself to
unrolling the loops a small number of times, and encoding offsets
calculated at assembly time within the load/store instructions of each
iteration.

In the more general case (codecs such as AAC and AC3) much larger arrays
are used - mdct_bits == [8, 9, 11]. The old method does not scale for
these cases, so more integer registers are used with non-unrolled versions
of the loops (and with some stack spillage). The postrotation filter loop
is still unrolled by a factor of 2 to permit the double-buffering of some
VFP registers to facilitate overlap of neighbouring iterations.

I benchmarked the result by measuring the number of gperftools samples
that hit anywhere in the AAC decoder (starting from aac_decode_frame())
or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
example AAC stream:

                 Before          After
                 Mean   StdDev   Mean   StdDev  Confidence  Change
aac_decode_frame  2368.1 35.8     2117.2 35.3    100.0%      +11.8%
ff_imdct_half_*   457.5  22.4     251.2  16.2    100.0%      +82.1%
---
libavcodec/arm/mdct_vfp.S |  146 ++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 144 insertions(+), 2 deletions(-)

Seems ok to me as well (builds well etc) although I don't know the code in detail.

// Martin
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to