On Wednesday, 26 February 2020 at 23:09:34 UTC, Basile B. wrote:
On Wednesday, 26 February 2020 at 20:44:31 UTC, Bruce Carneal
wrote:
After shuffling the input, branchless wins by 2.4X (240%).
I've replaced the input by the front of a rndGen (that pops for
count times and starting with a custom seed) and I never see
the decimalLength9_3 (which seems to be the closest to the
original in term of performances) doing better.
The uniform random uint distribution yields a highly skewed
digits distribution. Note, for example, that only 10 of the
2**32 possible outputs from a uint PRNG can be represented by a
single decimal digit.
Your current winner will be very hard to beat if the inputs are
uniform random. That implementation's high-to-low cascade of
early exit ifs aligns with PRNG output.
If you aren't sure what the input distribution is, or want
guaranteed good worst case behavior, then branchless may be a
better way to go.
Maybe you talked about another implementation of decimalLength9
?
Yes. It's one I wrote after I saw your post. Psuedo-code here:
auto d9_branchless(uint v) { return 1 + (v >= 10) + (v >= 100)
... }
Using ldc to target an x86 with the above yields a series of
cmpl, seta instruction pairs in the function body followed by a
summing and a return. No branching.
Let me know if the above is unclear or insufficient.