This increased the frames/sec performance by 120% when decoding the 720p Wildlife.wmv (from Windows 7) on a Snapdragon 600.
This is based on code in the libavcodec/arm directory by Rob Clark and Mans Rullgard. This also includes David Conrad's old proposed patch for chroma: http://ffmpeg.org/pipermail/ffmpeg-devel/2009-April/059877.html I did not have the resources to run FATE, but I used my own test harness comparing the results of the NEON functions to the C functions and I also used framemd5 on some VC1 files that I had. The result was that I made the inv_trans code resilient to a little bit of overflow, while still doing 16-bit math. Why the overflow? Probably just out of spec files produced by old, buggy encoders. I did not implement avg versions of the mspel functions because I did not have files to test it. I did not attempt the overlap or loop filter code. Mason Carter (2): VC1 DSP: ARM NEON assembly VC1 DSP: NEON no_rnd chroma MC libavcodec/arm/Makefile | 3 + libavcodec/arm/h264cmc_neon.S | 13 + libavcodec/arm/vc1dsp.h | 26 + libavcodec/arm/vc1dsp_init_arm.c | 32 + libavcodec/arm/vc1dsp_init_neon.c | 94 +++ libavcodec/arm/vc1dsp_neon.S | 1170 +++++++++++++++++++++++++++++++++++++ libavcodec/vc1dsp.c | 2 + libavcodec/vc1dsp.h | 1 + 8 files changed, 1341 insertions(+) create mode 100644 libavcodec/arm/vc1dsp.h create mode 100644 libavcodec/arm/vc1dsp_init_arm.c create mode 100644 libavcodec/arm/vc1dsp_init_neon.c create mode 100644 libavcodec/arm/vc1dsp_neon.S -- 1.8.3.msysgit.0 _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
