On Wednesday, 26 February 2020 at 22:07:30 UTC, Johan wrote:
On Wednesday, 26 February 2020 at 00:50:35 UTC, Basile B. wrote:
[...]
Hi Basile,
I recently saw this presentation:
https://www.youtube.com/watch?v=Czr5dBfs72U
It has some ideas that may help you make sure your measurements
are good and may give you ideas to find the performance
bottleneck or where to optimize.
llvm-mca is featured on godbolt.org:
https://mca.godbolt.org/z/YWp3yv
cheers,
Johan
yes llvm-mca looks excellent, although I don't know if it worth
continuing... You see this function is certainly not a
bottleneck, it's just that I wanted to try better than the naive
implementation.
Fundamentatlly the problem is that
1. the original is smaller, faster to decode
2. the alternatives (esp. the 3rd) is conceptually better but the
cost of the jump table + lzcnt wastes it.