[ 
https://issues.apache.org/jira/browse/RNG-123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976638#comment-16976638
 ] 

Sebastiano Vigna commented on RNG-123:
--------------------------------------

Since you're striving for precision, I'd say "the second 64 bits, with the 
exception of the most significant bit, which is discarded, choose...".

There's a fairly long discussion here:

[https://www.sciencedirect.com/science/article/pii/S0378475416300829]

I don't think using the same generator on different data has any guarantee of 
independence though.

One thing you can actually prove is that in some sense different streams 
generated from different starting states of a linear generator are "pairwise 
empirically independent" if they do not overlap. If you consider any states X 
and Y, and M is the transition matrix, after n steps you get M^nX and M^nY. But 
by linearity M^nX ⨁ M^nY = M^n(X ⨁ Y), so the xor between the streams is 
another empirically random stream. If the xor between two streams looks random, 
we might agree that the streams are independent. "Pairwise" here is crucial 
because as you increase the number of streams you get more and more dependent 
streams. But pairwise is a good start.

The argument does not work always with LCGs because you get that if d is the 
difference between X and Y, after n step is a^nd mod m. If, for example, the 
modulus m is a power of 2 and 2^c | d for a large c, this will be true forever. 
This is the case I was highlighting above—if you change just the upper bits of 
state you get strongly correlated streams.

In other words: linear generators have pairwise "independent" substreams—you 
cannot, even adversarially, find bad pairs. Power-of-two LCGs do not.

LCGs with prime moduli are a completely different thing (see the 
multiple-stream facility of MRG32k3a).

 

 

> PCG generators may exhibit massive stream correlation
> -----------------------------------------------------
>
>                 Key: RNG-123
>                 URL: https://issues.apache.org/jira/browse/RNG-123
>             Project: Commons RNG
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>            Reporter: Sebastiano Vigna
>            Priority: Major
>
> [This is based on an issue I posted on the Rust development mailing list.]
> The documentation of the PCG generators does not state explicit that 
> different seeds generate independent sequences, but the existence of a 
> 128-bit seed implies somehow the idea that the whole seed is meaningful.
> The user should be made aware that the second parameter (the constant of the 
> underlying LCG) is almost useless from a mathematical and statistical 
> viewpoint.
> Changing constant to an LCG with power-of-two modulus and a constant like the 
> one used in PCG simply adds a constant to the sequence (more precisely, there 
> are two equivalence classes of constants, and in each equivalence class the 
> sequences are identical, except for an additive constant).
> The minimal scrambling done by the generator usually cannot cancel this fact, 
> and as a result changing the constant is equivalent to changing the initial 
> state (modulo an additive constant). You can try to run this program:
>  
> {noformat}
> import org.apache.commons.rng.core.source32.PcgXshRr32;
> import com.google.common.primitives.Ints;
> public class TestPCG {
>     public static void main(final String[] args) {
>         final long state = Long.parseLong(args[0]);
>         final long c = Long.parseLong(args[1]);
>         final long d = Long.parseLong(args[2]);
>         if (c % 2 != d % 2) throw new IllegalArgumentException();
>         final long C = c << 1 | 1;
>         final long D = d << 1 | 1;
>         final long r = 1314878037273365987L * ((d - c) >>> 1);
>         final PcgXshRr32 rng0 = new PcgXshRr32(new long[] { state, c });
>         final PcgXshRr32 rng1 = new PcgXshRr32(new long[] {
>             0xc097ef87329e28a5L  *(6364136223846793005L * (state + C) + C - r 
> - D) - D, d });
>         for(;;) {
>             final int a = rng0.nextInt();
>             System.out.write(Ints.toByteArray(a), 0, 4);
>             final int b = rng1.nextInt();
>             System.out.write(Ints.toByteArray(b), 0, 4);
>         }
>     }
> }{noformat}
> You can pass any state as first argument, and any two constants as the 
> following two arguments, as long as they are either both even or both odd . 
> The program will set up a second initial state so that the sequences 
> generated by the PRNGs using the two constants as seed are based on almost 
> identical underlying LCG sequences, in spite of having arbitrary, different 
> constants and different initial states. The two streams should be 
> independent, but if you pipe the output in PractRand you'll get immediately
> {noformat}
> rng=RNG_stdin32, seed=unknown
> length= 4 megabytes (2^22 bytes), time= 2.1 seconds
>   Test Name                         Raw       Processed     Evaluation
>   BCFN(0+0,13-5,T)                  R=+263.2  p =  3.4e-103   FAIL !!!!!
>   BCFN(0+1,13-5,T)                  R=+128.6  p =  1.5e-50    FAIL !!!!
>   BCFN(0+2,13-6,T)                  R= +65.2  p =  9.2e-23    FAIL !!
>   BCFN(0+3,13-6,T)                  R= +15.4  p =  1.0e-5   mildly suspicious
>   DC6-9x1Bytes-1                    R= +59.1  p =  4.2e-33    FAIL !!!
>   DC6-6x2Bytes-1                    R= +34.1  p =  9.0e-19    FAIL !
>   DC6-5x4Bytes-1                    R= +15.2  p =  7.7e-8   very suspicious
>   [Low4/16]BCFN(0+1,13-6,T)         R= +12.0  p =  1.5e-4   unusual
>   [Low4/16]FPF-14+6/64:(4,14-8)     R=  +9.2  p =  1.1e-6   unusual
>   [...]
>   [Low8/32]FPF-14+6/4:(9,14-9)      R= +27.4  p =  1.6e-17    FAIL
>   [Low8/32]FPF-14+6/4:(10,14-10)    R= +16.4  p =  2.5e-9   suspicious
>   [Low8/32]FPF-14+6/4:all           R=+283.4  p =  8.4e-255   FAIL !!!!!!
>   [Low8/32]Gap-16:A                 R=+414.8  p =  2.4e-336   FAIL !!!!!!!
>   [Low8/32]Gap-16:B                 R= +1736  p =  5e-1320    FAIL 
> !!!!!!!!{noformat}
> You can also offset one of generator by hundred of iterations, but the 
> sequences are so correlated that the result won't change. If you peek at the 
> state of the two generators you'll see that their difference is constant.
> I think the reader should be made aware of the danger. If you start several 
> generators of this kind the state is too small to guarantee that there will 
> be no overlap. Once you get overlap, since there are in practice just two 
> sequences, you will get a lot of unwanted correlation.
> There's a reason why nobody in the last decades ever considered creating 
> "streams" using the constant part of an LCG, and it's that people realized 
> very early that it doesn't work (see, e.g., 
> [https://ieeexplore.ieee.org/document/718715]).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to