On Wednesday, 26 February 2020 at 00:50:35 UTC, Basile B. wrote:
So after reading the translation of RYU I was interested too see if the decimalLength() function can be written to be faster, as it cascades up to 8 CMP.

...

Then bad surprise. Even with ldmd (so ldc2 basically) feeded with the args from the script line. Maybe the fdecimalLength9 version is slightly faster. Only *slightly*. Good news, I've lost my time. So I try an alternative version that uses a table of delegates instead of a switch (ffdecimalLength9) and surprise, "tada", it is like **100x** slower then the two others.

How is that possible ?

Hi Basile,
I recently saw this presentation: https://www.youtube.com/watch?v=Czr5dBfs72U It has some ideas that may help you make sure your measurements are good and may give you ideas to find the performance bottleneck or where to optimize. llvm-mca is featured on godbolt.org: https://mca.godbolt.org/z/YWp3yv

cheers,
  Johan

Reply via email to