On Wednesday, 26 February 2020 at 22:07:30 UTC, Johan wrote:
On Wednesday, 26 February 2020 at 00:50:35 UTC, Basile B. wrote:
[...]

Hi Basile,
I recently saw this presentation: https://www.youtube.com/watch?v=Czr5dBfs72U It has some ideas that may help you make sure your measurements are good and may give you ideas to find the performance bottleneck or where to optimize. llvm-mca is featured on godbolt.org: https://mca.godbolt.org/z/YWp3yv

cheers,
  Johan

yes llvm-mca looks excellent, although I don't know if it worth continuing... You see this function is certainly not a bottleneck, it's just that I wanted to try better than the naive implementation.

Fundamentatlly the problem is that
1. the original is smaller, faster to decode
2. the alternatives (esp. the 3rd) is conceptually better but the cost of the jump table + lzcnt wastes it.

Reply via email to