On 25/02/15 9:41 AM, Ronald S. Bultje wrote: > Hi, > > On Tue, Feb 24, 2015 at 8:05 PM, James Almer <jamr...@gmail.com> wrote: >> >> +#if HAVE_FAST_POPCNT >> +#if AV_GCC_VERSION_AT_LEAST(4,5) >> +#ifndef av_popcount >> + #define av_popcount __builtin_popcount >> +#endif /* av_popcount */ >> +#if HAVE_FAST_64BIT >> +#ifndef av_popcount64 >> + #define av_popcount64 __builtin_popcountll >> +#endif /* av_popcount64 */ >> +#endif /* HAVE_FAST_64BIT */ >> +#endif /* AV_GCC_VERSION_AT_LEAST(4,5) */ >> +#endif /* HAVE_FAST_POPCNT */ >> > > Is this just to get the sse4 popcnt instruction if we compile with > -mcpu=sse4? The slightly odd thing is that we're using a built-in, yet > configure still does an arch/cpu check. I'd expect the built-in/compiler to > do that for us based on -mcpu, and we could always unconditionally use this > (as long as gcc >= 4.5); alternatively, you could use inline asm and then > have the configure check (HAVE_FAST_POPCNT). But doing both seems a little > odd. I have no objection to it, patch is still fine, just odd. > > Ronald
I purposely made the checks for gcc 4.5 and in configure for cpus that support popcnt because otherwise __builtin_popcount (at least gcc's) is slower than our generic av_popcount_c function from lavu/common.h. When the CPU supports popcnt the builtin becomes a single inlined instruction. I tried the __asm__ approach, but the code generated by the builtin seemed better. And I agree it looks odd and maybe way too specific, which is why i said i can add this to a new header in the x86/ folder instead. Patch attached. I don't have clang so i can't test it, nor i know how to check for a version that supports the builtin.
>From 056a0ec11618bc02ea52eff269dc7bc016dc9897 Mon Sep 17 00:00:00 2001 From: James Almer <jamr...@gmail.com> Date: Wed, 25 Feb 2015 14:21:02 -0300 Subject: [PATCH] libavutil: add x86 optimized av_popcount Signed-off-by: James Almer <jamr...@gmail.com> --- libavutil/intmath.h | 3 +++ libavutil/x86/intmath.h | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+) create mode 100644 libavutil/x86/intmath.h diff --git a/libavutil/intmath.h b/libavutil/intmath.h index fa549c8..f5ecc77 100644 --- a/libavutil/intmath.h +++ b/libavutil/intmath.h @@ -29,6 +29,9 @@ #if ARCH_ARM # include "arm/intmath.h" #endif +#if ARCH_X86 +# include "x86/intmath.h" +#endif /** * @addtogroup lavu_internal diff --git a/libavutil/x86/intmath.h b/libavutil/x86/intmath.h new file mode 100644 index 0000000..2b71cec --- /dev/null +++ b/libavutil/x86/intmath.h @@ -0,0 +1,38 @@ +/* + * Copyright (c) 2015 James Almer + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef AVUTIL_X86_INTMATH_H +#define AVUTIL_X86_INTMATH_H + +#include <stdint.h> +#include "config.h" + +/* Our generic version of av_popcount is faster than GCC's built-in on + * CPUs that don't support the popcnt instruction. + */ +#if defined(__GNUC__) && defined(__POPCNT__) + #define av_popcount __builtin_popcount +#if ARCH_X86_64 + #define av_popcount64 __builtin_popcountll +#endif + +#endif /* defined(__GNUC__) && defined(__POPCNT__) */ + +#endif /* AVUTIL_X86_INTMATH_H */ -- 2.3.0
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel