https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956
--- Comment #7 from Steinar H. Gunderson <steinar+gcc at gunderson dot no> --- To wrap this up, confirming that GCC 11 does well on my benchmark: BM_Chain20 54529 iterations 18781 ns/iter GCC 10, asm bsfq BM_Chain20 44584 iterations 22509 ns/iter GCC 10, ffsll() BM_Chain20 49753 iterations 20216 ns/iter GCC 11, asm bsfq BM_Chain20 53346 iterations 18816 ns/iter GCC 11, ffsll() BM_Chain20 64926 iterations 15747 ns/iter Clang 12, asm bsfq BM_Chain20 71208 iterations 14374 ns/iter Clang 12, ffsll() So basically for 11+, the ffsll() statement does better than the bsfq statement, whereas it used to do markedly worse. Clang does even better, but I can live with that. :-)