#3575: mkStdGen and split conspire to make some programs predictable
-----------------------------+----------------------------------------------
Reporter:  rwbarton          |          Owner:                  
    Type:  bug               |         Status:  new             
Priority:  normal            |      Component:  libraries/random
 Version:  6.10.1            |       Severity:  normal          
Keywords:                    |       Testcase:                  
      Os:  Unknown/Multiple  |   Architecture:  Unknown/Multiple
-----------------------------+----------------------------------------------
 The following program `random.hs` is intended to produce a line containing
 30 random 0s or 1s.  It is not an example of the best way to use
 System.Random, but it looks innocuous enough.

 {{{
 import Control.Monad
 import System.Random

 print0or1 = do
   g <- newStdGen
   putStr . show . fst $ randomR (0, 1 :: Int) g

 main = replicateM_ 30 print0or1 >> putStrLn ""
 }}}

 Let's try running it a thousand times:

 {{{
 rwbar...@functor:/tmp$ ghc-6.10.1 -O2 --make random
 [1 of 1] Compiling Main             ( random.hs, random.o )
 Linking random ...
 rwbar...@functor:/tmp$ for i in `seq 1 1000` ; do ./random >> random.out ;
 done
 rwbar...@functor:/tmp$ sort random.out | uniq | wc -l
 60
 }}}

 That's odd... there are 2^30^ possible output lines, but when I tried to
 generate 1000 random ones, I only got 60 distinct outputs.  Why did that
 happen?

 One might think this is due to poor initial seeding of the random number
 generator (due to the time not changing very much during the test), but
 this is not the case.  Attached is a fancier version of the program which
 reads an initial seed from `/dev/urandom`; it exhibits the same behavior.

 This phenomenon is not too hard to explain.  It is ultimately due to a
 poor interaction between `mkStdGen` and `split`.  First, we need to know a
 bit about the design of System.Random (some statements simplified slightly
 for this discussion).

  * The state of the RNG consists of two `Int32`s, `s1` and `s2`.

  * The initial state produced by mkStdGen almost always has `s2` equal to
 1.  (Extremely rarely, it might have `s2` equal to 2.  We'll ignore this
 as it doesn't affect the argument.)

  * To generate a random 0 or 1, we first generate a new state using some
 simple functions `s1' = next1(s1)`, `s2' = next2(s2)`.  (Note that `s1`
 and `s2` "evolve" independently.)  The random value returned is the lowest
 bit of `s1'` minus `s2'`.

  * Splitting the generator `(s1, s2)` yields the two generators `(s1+1,
 next2(s2))` and `(next1(s1), s2-1)`.

 Our program functions as follows.

  * Initialize the generator stored in `theStdGen` (`s1` is some varying
 value `a`, `s2` is 1).

  * Repeatedly split the generator, replacing it with the first output, and
 use the second output to generate a 0 or 1.

 If we watch `theStdGen` while our program runs, we will see that `s1` is
 incremented by 1 at each step, while `s2` follows the fixed sequence `1`,
 `next2(1)`, `next2(next2(1))`, etc.  The 0 or 1 we output at the `k`th
 step is thus the lowest bit of `next1(next1(a+k-1))` minus `b`,,k,,, where
 `b`,,k,, is some fixed sequence.  And as `k` varies, `next1(next1(a+k-1))`
 turns out to be just an arithmetic sequence with fixed difference modulo a
 fixed prime so its lowest bits are extremely predictable even without
 knowing `a`.

 This issue can be fixed to some extent, without breaking backwards
 compatibility, by adding another method (besides `mkStdGen`) to create a
 generator, which does not have predictable `s2`, and using it to
 initialize the system RNG.

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/3575>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Reply via email to