On Thu, Dec 05, 2013 at 04:36:26PM -0800, Jarno Rajahalme wrote:
> Inline, use another well-known algorithm for 64-bit builds, and use
> builtins when they are known to be fast at compile time.  A 32-bit
> version of the alternate algorithm is slower than the existing
> implementation, so the old one is used for 32-bit builds.  Inline
> assembler would be a bit faster on 32-bit i7 build, but we use the GCC
> builtin for portability.
> 
> It should be stressed builds for specific CPUs do not work on others
> CPUs, and that OVS build system or runtime does not currently support
> CPU detection.
> 
> Speed improvement v.s. existing implementation / GCC 4.7
> __builtin_popcountll():
> 
> i386:         64%  (inlining)                         / 380%
> i386 on i7:   240% (inlining + builtin)               / 820%
> x86_64:       59%  (inlining + different algorithm)   / 190%
> x86_64 on i7: 370% (inlining + builtin)               / 0%
> 
> Signed-off-by: Jarno Rajahalme <jrajaha...@nicira.com>

Instead of defined(__corei7), I would write __POPCNT__, a GCC macro
specific to popcnt instruction support.  I don't think that __corei7 is
a good test because it is too specific: successors to Core i7 will
almost certainly also have POPCNT.

Acked-by: Ben Pfaff <b...@nicira.com>
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to