On Wednesday, 26 February 2020 at 23:09:34 UTC, Basile B. wrote:
On Wednesday, 26 February 2020 at 20:44:31 UTC, Bruce Carneal wrote:

After shuffling the input, branchless wins by 2.4X (240%).

I've replaced the input by the front of a rndGen (that pops for count times and starting with a custom seed) and I never see the decimalLength9_3 (which seems to be the closest to the original in term of performances) doing better.

The uniform random uint distribution yields a highly skewed digits distribution. Note, for example, that only 10 of the 2**32 possible outputs from a uint PRNG can be represented by a single decimal digit.

Your current winner will be very hard to beat if the inputs are uniform random. That implementation's high-to-low cascade of early exit ifs aligns with PRNG output.

If you aren't sure what the input distribution is, or want guaranteed good worst case behavior, then branchless may be a better way to go.


Maybe you talked about another implementation of decimalLength9 ?

Yes.  It's one I wrote after I saw your post. Psuedo-code here:

auto d9_branchless(uint v) { return 1 + (v >= 10) + (v >= 100) ... }

Using ldc to target an x86 with the above yields a series of cmpl, seta instruction pairs in the function body followed by a summing and a return. No branching.

Let me know if the above is unclear or insufficient.





















Reply via email to