On Sat, May 14, 2022 at 05:03:17PM -0700, Philip Guenther wrote: > On Sun, 15 May 2022, Steffen Nurpmeso wrote: > > Stuart Henderson wrote in > ... > > |what's the perceived problem you're wanting to solve? and does that > > |problem actually exist in the first place? > > > > The problem is that if have a low upper bound then modulo will "remove a > > lot of randomization". For example if you have a program which > > generates Lotto numbers (1..49), then using _uniform() as it is will > > generate many duplicates. > > Wut. The *WHOLE POINT* of arc4random_uniform() is that it has uniform > distribution. Says right so in the manpage. If an implementation of that > API fails to do that, it's a broken implementation. > > The OpenBSD implementation achieves that. NetBSD's implementation has the > same core logic. Here's my quick lotto pick demo program, doing 4900000 > picks, so they should all be somewhere near 100000: > > #include <stdlib.h> > #include <stdio.h> > #define LIMIT 49 > int > main(void) > { > unsigned counts[LIMIT] = { 0 }; > int i; > for (i = 0; i < 100000 * LIMIT; i++) > counts[arc4random_uniform(LIMIT)]++; > for (i = 0; i < LIMIT; i++) > printf("%2d\t%7u\n", i+1, counts[i]); > return 0; > } > > And a sample run: > > : bleys; ./a.out > > 1 100100 > 2 100639 > 3 99965 > 4 99729 > 5 99641 > 6 99650 > 7 100299 > 8 100164 > 9 99791 > 10 100101 > 11 100657 > 12 100042 > 13 99661 > 14 99927 > 15 99426 > 16 99491 > 17 99646 > 18 100133 > 19 100013 > 20 99942 > 21 99873 > 22 99924 > 23 99567 > 24 100152 > 25 100688 > 26 100011 > 27 100481 > 28 99980 > 29 100406 > 30 99726 > 31 99808 > 32 99929 > 33 100050 > 34 99983 > 35 100048 > 36 99771 > 37 99906 > 38 100215 > 39 100261 > 40 100426 > 41 99847 > 42 99533 > 43 100368 > 44 99695 > 45 100041 > 46 100465 > 47 99875 > 48 100034 > 49 99920 > : bleys; > > Looks pretty good to me, with repeated runs showing the values bouncing > around. > > > ... > > I was a bit interested when i saw Luke's message, but i am no > > mathematician and, like i said, i in fact never needed the > > functionality. And the question i would have is how small > > upper_bound has to be for the modulo problem to really kick in. > > If the implementation isn't broken, it's not a problem, period. > > > > And even if, whether it matters. For example, if you have > > a bitset of 1024 "in-flight things", and the small upper bound > > hits a used slot, you could simply search forward until you find > > a free one. I mean, with 1024 possible values the number of > > possibilities is anyhow restricted. > > Well hey, let's give Luke's a try and see how uniform it is: > > > #include <stdlib.h> > #include <stdio.h> > #define LIMIT 49 > int > main(void) > { > unsigned counts[LIMIT] = { 0 }; > unsigned counts2[LIMIT] = { 0 }; > int i; > for (i = 0; i < 100000 * LIMIT; i++) { > counts[arc4random_uniform(LIMIT)]++; > counts2[arc4random_uniform_fast_simple(LIMIT)]++; > } > for (i = 0; i < LIMIT; i++) > printf("%2d\t%7u\t%7u\n", i+1, counts[i], counts2[i]); > return 0; > } > > : bleys; ./a.out > 1 100188 76502 > 2 99983 76602 > 3 100521 76522 > 4 100416 76682 > 5 100171 76387 > 6 100163 76759 > 7 100024 76336 > 8 100009 76703 > 9 99769 76237 > 10 99892 76532 > 11 100197 76730 > 12 100483 76398 > 13 99769 76310 > 14 100075 76474 > 15 99781 76599 > 16 99846 76439 > 17 99814 76430 > 18 100313 76648 > 19 100259 76813 > 20 99885 77068 > 21 100302 76546 > 22 99998 76698 > 23 99491 76678 > 24 100340 76324 > 25 99763 115263 > 26 99872 153008 > 27 100022 152979 > 28 99481 153793 > 29 100018 210714 > 30 99617 229286 > 31 100167 297003 > 32 100270 449664 > 33 100468 76790 > 34 99115 76452 > 35 99921 76392 > 36 99862 76140 > 37 100485 76607 > 38 100029 75885 > 39 99577 76498 > 40 99479 76727 > 41 100139 76746 > 42 100883 76698 > 43 100102 76474 > 44 99801 76592 > 45 100117 76124 > 46 99678 76417 > 47 99770 76639 > 48 99524 77034 > 49 100151 76658 > : bleys; > > Wow, that last column is *bad*. Repeated runs show 25-32 are consistently > high, 32 being *outrageously* off, and the others are low. > > Luke's implementation does not correctly implement the API. Doesn't > matter if it's a million times faster when it doesn't deliver the goal. > > > Philip Guenther >
In my first reply I already said: > I see a lot of code with no demonstration, explanation or proof why it > would be correct (i.e. produces uniform results). You only talk about speed. As I suspected it would be non-uniform. Thanks for demonstrating that. -Otto