| Issue |
74101
|
| Summary |
misoptimizations around `__builtin_clz`
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
amotzop
|
You can use `__builtin_clz` to calculate the integer part of `log2(integer)` as `ilog(num) = 31 - __builtin_clz(num)` which produces a single `bsr` instruction (on x86-64).
However, if you add 1 to the result it uses a `xor` and `add` instructions after the `bsr` instead of just using a single `add` or `inc` instruction ([godbolt](https://godbolt.org/z/xavKh5n9T) example, [quick-bench](https://quick-bench.com/q/NduZQs_oSySAldBvjlfhsmjZqII) comparison) .
(This is just one example, but it seems to be also true for other arithmetic operations)
I think this comes from the IR representing `__builtin_clz` as an "atomic" operation, but in reality it's expanded into two instructions.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs