Replying inline, so I can target specific points better. -----
Isiah Meadows [email protected] Looking for web consulting? Or a new website? Send me an email and we can get started. www.isiahmeadows.com On Sun, Apr 29, 2018 at 7:31 PM, J Decker <[email protected]> wrote: > I can see that's certainly something that can be gotten wrong; the in-place > sort is kind-of nice; but you end up hitting most numbers twice, and with a > longer array (say a set of 75 bingo balls) you can move the same number > multiple times. I specifically mentioned this (the easy to screw up) in the link I created and posted here. > > which gives numbers at the end a higher chance of being at the start when > all is done; and almost guaranteed to not be at the end. I use sha2 as a > stream of bits; and when it runs out of bits, invokes a callback to get more > salt ( that is optional, although adding Date.now() is sufficient to > increase the overall entropy slightly ). This can also be used as a > generator procedural content generation, since you can specify the initial > salt and subsequent salts, or even copy the state, and fork two streams > using slightly different progressive entropy. > https://github.com/d3x0r/-/blob/master/org.d3x0r.common/salty_random_generator.js > > And then to shuffle, for each number in the array, assign a random number; > hang in a binary tree from least to most; then gather the tree back out. > requires additional memory approximately the 4xsize of the original array > tough temporarily. (don't seem to have a JS version, but the C version > ports pretty easily) > https://github.com/d3x0r/SACK/blob/master/src/utils/cardodds/shuffle.c For an in-place shuffle, that's still not especially efficient. It's still taking O(n log n) stack space + global space + time, even though it doesn't require real heap allocation. > > This allows any number to result in any position equally. > The SHA2 hash generates very good random number characteristics; evaluating > with diehard tests; chi-square etc. > Allowing more salt to be introduced periodically destroys any overall period > (even if it is in the realm of 256 bits). Basing it on a cryptographic primitive is what I was referring to for larger arrays. You need one that powerful for large arrays if you hope to maintain any sort of statistical soundness. However, for small arrays (like a "deck" of cards), it is likely overkill. > > if you were shuffling a deck of cards (52) you only need random values that > are 6-7 bits at a time... > > It is slower than say http://www.pcg-random.org/using-pcg.html or mersenne, > both of which I found to be very poor. pcg generates zeros like 70% of the > time. If it's generating zeros that frequently, don't use it. The website itself looks a bit on the sketchy side to me, but that's just me.\* A Mersenne Twister with a large state shouldn't give you too much trouble for simple stuff, and if you take the SFMT variant [1], you shouldn't have speed issues too much. Also, note that I did specify a higher-quality iteration on that, WELL [2], which is similar, but improves on both the statistical properties and speed of the traditional Mersenne Twister. \* If you're not cryptographically secure, don't compare yourself to ones that are, especially trying to make yourself look better than them. Also, it missed a few other high quality generators (like the xorshift128 family and xorshiro128+, two of the most efficient, effective fast PRNGs out there) in its online comparison. [1]: https://en.wikipedia.org/wiki/Mersenne_Twister#SFMT [2]: https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear > > While a good shuffle should be able to start from the initial state and > generate a good shuffled result, it is slightly better to progressively > shuffle the previous result array into new positions than to start from an > initial state and compute a 1-off. Evidence? (BTW, the Fisher-Yates shuffle is theoretically as biased as the RNG that powers it.) > > On Sun, Apr 29, 2018 at 3:27 PM, Isiah Meadows <[email protected]> > wrote: >> >> BTW, I added this to my list of various proposed array additions (as a >> weak one) [1]. >> >> I did do a little reading up and found that in general, there's a >> major glitch that makes shuffling very hard to get right (issues even >> Underscore's `_.shuffle` doesn't address), specifically that of the >> size of the random number generator vs how many permutations it can >> hit. Specifically, if you want any properly unbiased shuffles covering >> all permutations of arrays larger than 34 entries (and that's not a >> lot), you can't use any major engines' `Math.random` (which typically >> have a max seed+period of 128 bits). You have to roll your own >> implementation, and you need at least something like xorshift1024* [2] >> (1024-bit max seed/period size) for up to 170 entries, MT19937 >> [3]/WELL19937 [4] (seed/period up to 2^19937-1) for up to 2080 >> entries, or MT44497/WELL44497 for up to 4199 entries. Keep in mind the >> minimum seed/period size in bits grows roughly `ceil(log2(fact(N)))` >> where `N` is the length of the array [5], and the only way you can >> guarantee you can even generate all possible permutations of all >> arrays (ignoring potential bias) is through a true hardware generator >> (with its potentially infinite number of possibilities). Also, another >> concern is that the loop's bottleneck is specifically the random >> number generation, so you can't get too slow without resulting in a >> *very* slow shuffle. >> >> If you want anything minimally biased and much larger than that, >> you'll need a cryptographically secure pseudorandom number generator >> just to cover the possible states. (This is about where the >> intersection meets between basic statistics and cryptography, since >> cipher blocks are frequently that long.) But I'm leaving that as out >> of scope of that proposal. >> >> [1]: >> https://github.com/isiahmeadows/array-additions-proposal#arrayprototypeshuffle >> [2]: https://en.wikipedia.org/wiki/Xorshift#xorshift* >> [3]: https://en.wikipedia.org/wiki/Mersenne_Twister >> [4]: https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear >> [5]: >> http://www.wolframalpha.com/input/?i=plot+ceiling(log2(x!))+where+0+%3C+x+%3C+10000 >> >> ----- >> >> Isiah Meadows >> [email protected] >> >> Looking for web consulting? Or a new website? >> Send me an email and we can get started. >> www.isiahmeadows.com >> >> >> On Sun, Apr 29, 2018 at 4:01 PM, Alexander Lichter <[email protected]> wrote: >> > On 29.04.2018 18:34, Naveen Chawla wrote: >> >> >> >> I don't think there's such a thing as "real random" in digital algos, >> >> just >> >> "pseudo random". >> > >> > You are right. I meant 'unbiased' pseudo randomness here. >> > >> >> Apart from card games, what's the use case? >> > >> > There are a lot of potential use cases. The best that comes into my mind >> > is >> > sampling test data. >> > >> > >> > On 29.04.2018 19:01, Isiah Meadows wrote: >> >> >> >> I think this would be better suited for a library function rather than >> >> a >> >> language feature. I could see this also being useful also for >> >> randomized displays, but that's about it. And I'm not sure what an >> >> engine could provide here that a library couldn't - you can't really >> >> get much faster than what's in the language (minus bounds checking, >> >> but the likely frequent cache misses will eclipse that greatly), and >> >> it's not unlocking any real new possibilities. >> > >> > As Tab Atkins Jr. already pointed out it's not about performance >> > benefits. >> > It's about how error-prone custom shuffle implementations are/can be. >> > >> > _______________________________________________ >> > es-discuss mailing list >> > [email protected] >> > https://mail.mozilla.org/listinfo/es-discuss >> _______________________________________________ >> es-discuss mailing list >> [email protected] >> https://mail.mozilla.org/listinfo/es-discuss > > _______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

