On Thu, 20 Mar 2014, Ben Avison wrote:
Profiling results for overall audio decode and the mlp_filter_channel(_arm) function in particular are as follows:Before After Mean StdDev Mean StdDev Confidence Change 6:2 total 380.4 22.0 370.8 17.0 87.4% +2.6% (insignificant) 6:2 function 60.7 7.2 36.6 8.1 100.0% +65.8% 8:2 total 357.0 17.5 343.2 19.0 97.8% +4.0% (insignificant) 8:2 function 60.3 8.8 37.3 3.8 100.0% +61.8% 6:6 total 717.2 23.2 658.4 15.7 100.0% +8.9% 6:6 function 140.4 12.9 81.5 9.2 100.0% +72.4% 8:8 total 981.9 16.2 896.2 24.5 100.0% +9.6% 8:8 function 193.4 15.0 103.3 11.5 100.0% +87.2% Experiments with adding preload instructions to this function yielded no useful benefit, so these have not been included. The assembly version has also been tested with a fuzz tester to ensure that any combinations of inputs not exercised by my available test streams still generate mathematically identical results to the C version. --- libavcodec/arm/Makefile | 2 + libavcodec/arm/mlpdsp_arm.S | 435 ++++++++++++++++++++++++++++++++++++++ libavcodec/arm/mlpdsp_init_arm.c | 36 +++ libavcodec/mlpdsp.c | 2 + libavcodec/mlpdsp.h | 1 + 5 files changed, 476 insertions(+), 0 deletions(-) create mode 100644 libavcodec/arm/mlpdsp_arm.S create mode 100644 libavcodec/arm/mlpdsp_init_arm.c diff --git a/libavcodec/arm/Makefile b/libavcodec/arm/Makefile index 8bdccbd..c6cc96e 100644 --- a/libavcodec/arm/Makefile +++ b/libavcodec/arm/Makefile @@ -21,6 +21,8 @@ OBJS-$(CONFIG_H264PRED) += arm/h264pred_init_arm.o OBJS-$(CONFIG_H264QPEL) += arm/h264qpel_init_arm.o OBJS-$(CONFIG_HPELDSP) += arm/hpeldsp_init_arm.o \ arm/hpeldsp_arm.o +OBJS-$(CONFIG_MLP_DECODER) += arm/mlpdsp_init_arm.o \ + arm/mlpdsp_arm.o OBJS-$(CONFIG_MPEGAUDIODSP) += arm/mpegaudiodsp_init_arm.o OBJS-$(CONFIG_MPEGVIDEO) += arm/mpegvideo_arm.o OBJS-$(CONFIG_NEON_CLOBBER_TEST) += arm/neontest.o diff --git a/libavcodec/arm/mlpdsp_arm.S b/libavcodec/arm/mlpdsp_arm.S new file mode 100644 index 0000000..9e0bf57 --- /dev/null +++ b/libavcodec/arm/mlpdsp_arm.S @@ -0,0 +1,435 @@ +/* + * Copyright (c) 2014 RISC OS Open Ltd + * Author: Ben Avison <[email protected]> + * + * This file is part of Libav. + * + * Libav is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * Libav is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with Libav; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/arm/asm.S" + +// This code uses too many ARM-only tricks to easily assemble as Thumb +.arm
Just to be clear, the tricks that don't work in thumb mode are non-constant shifts, and jump tables with "ldr pc, [pc, ...]", right?
Forcing arm mode like this isn't ok in all configurations - e.g. when building for WinRT/Windows Phone 8, you really have to build all of it in thumb mode; the linker doesn't handle everything needed for mixing the modes there.
Would it be acceptable to build and run this code only if CONFIG_THUMB is disabled? That's the case for most raspberry pi builds at least, although I guess it would lead to not using this code at all on other e.g. armv7 builds on linux where it still could have been beneficial?
// Martin _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
