http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041
--- Comment #11 from Cristian RodrÃguez <crrodriguez at opensuse dot org> --- Not to be annoying, but compiling the test case attached to this bug report with clang 3.3 produces code in where inline u32 popcount64_1(u64 x) { return __builtin_popcountll(x); } is over 3 times faster than GCC 4.8.1 in x86_64. I think GCC could "just" generate IFUNCS for generic targets , in x86_64 one function with attribute target popcnt and the other a call to libgcc that at least matches the clang performance.