The compilers overall do a better job in optimizing away branches and gets the performance on 32bit platforms similar to get_vlc2().
The vorbis testcase shows improvements all over but for clang-4.0 on arm, possibly due the fact the platform isn't as well supported as gcc-6.3. --- Data: https://gist.github.com/lu-zero/171c854498ba934cdb7bae385f045e5b libavcodec/bitstream.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/libavcodec/bitstream.h b/libavcodec/bitstream.h index 894a13e..3ad777b 100644 --- a/libavcodec/bitstream.h +++ b/libavcodec/bitstream.h @@ -288,8 +288,9 @@ static inline int bitstream_read_xbits(BitstreamContext *bc, unsigned length) } /* Return the LUT element for the given bitstream configuration. */ -static inline int set_idx(BitstreamContext *bc, int code, int *n, int *nb_bits, - VLC_TYPE (*table)[2]) +static av_always_inline +int set_idx(BitstreamContext *bc, int code, int *n, int *nb_bits, + VLC_TYPE (*table)[2]) { unsigned idx; @@ -310,8 +311,9 @@ static inline int set_idx(BitstreamContext *bc, int code, int *n, int *nb_bits, * If the VLC code is invalid and max_depth = 1, then no bits will be removed. * If the VLC code is invalid and max_depth > 1, then the number of bits removed * is undefined. */ -static inline int bitstream_read_vlc(BitstreamContext *bc, VLC_TYPE (*table)[2], - int bits, int max_depth) +static av_always_inline +int bitstream_read_vlc(BitstreamContext *bc, VLC_TYPE (*table)[2], + int bits, int max_depth) { int nb_bits; unsigned idx = bitstream_peek(bc, bits); -- 2.9.2 _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
