[ 
https://issues.apache.org/jira/browse/RNG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814611#comment-16814611
 ] 

Alex D Herbert commented on RNG-85:
-----------------------------------

I've put together an implementation of this generator without using the 
original code to do the seeding.

To check it against a reference implementation I currently have the RNG 
constructor accept a single {{long}}. This is used to create the increment for 
the Weyl sequence. The only condition for this input is that is an odd number. 
However it helps if this increment has good randomness in the bits (more on 
this later). So the input seed is checked for complexity using a method similar 
to {{SplittableRandom}} (which is also based on a Weyl sequence). The check 
counts the number of transitions in the seed from 0 to 1 or vice versa.
{code:java}
  static int countTransitions(long value) {
    // Sign-extending shift properly counts bits at the ends
    return Long.bitCount(value ^ (value >> 1));
  }
{code}
If it passes a critical value then the value goes through untouched. So it can 
be seeded using any of the example seeds in the original paper or source code 
which have good complexity.

If the seed is low complexity then the complexity is boosted by {{xor}} ing the 
seed with an alternating bit mask {{0xaaaaaaaa}} for the upper and lower 
32-bits separately. The SplittableRandom defines complexity as less than 24 
transitions in the 64-bit seed. I have modelled the possible transitions as a 
Binomial distribution (BD) with n the number of changes and p set to 0.5. 
Dividing the seed into upper and lower bits this is then modelled using a 
BD(n=31,p=0.5). This has a mean of np which is 15.5. The cumulative probabilty 
is readily computed. The cut-off of 24 used in the SplittableRandom corresponds 
to approximately 1.6% of possible values. (Note: This value is computed 
empirically from a distribution of random numbers as the SplittableRandom uses 
an unsigned bit shift when computing transitions leading to overcounting by 
approximately 0.5. As such it cannot be modelled using a binomial 
distribution.) I have set the threshold as 0.5% of possible values in the upper 
and lower 32-bit separately. This requires at least 9 transitions. The result 
is approximately 1% of seeds are not good enough and must be adjusted.

This allows it to pass the tests in Core and Simple modules and a quick run 
using 8 trials passes SmallCrush with no failures. So this seeding approach is 
working.

JMH results verses SplitMix:
||randomSourceName||Method||Score||Error||Median||Error/Score||
|BASELINE|nextDouble|2.657|0.339|2.587|0.128|
|MSWS|nextDouble|5.312|0.006|5.311|0.001|
|SPLIT_MIX_64|nextDouble|4.539|0.013|4.537|0.003|
| |baselineDouble|2.349|0.305|2.287|0.130|
| |baselineVoid|0.269|0.006|0.267|0.022|
|BASELINE|nextInt|2.363|0.002|2.363|0.001|
|MSWS|nextInt|3.209|0.003|3.208|0.001|
|SPLIT_MIX_64|nextInt|3.581|0.044|3.572|0.012|
| |baselineInt|2.045|0.007|2.043|0.003|
| |baselineVoid|0.266|0.000|0.266|0.000|

Note using these results you can subtract the baseline from each score since it 
measures the overhead within JMH for testing methods that produce a primitive 
value.

So not as fast for double generation but much faster for int generation.

Regarding the seeding.

The MSWS paper states that:
{noformat}
"The constant s should be an irregular bit pattern with roughly half of the
bits set to one and half of the bits set to zero. The best statistics have been
obtained with this type of pattern. Unduly sparse or dense values do not
produce good results."
{noformat}
So we could just seed the generator using any random number, as long as the 
number of bit transitions is not too low. Defining the level for low is vague 
and currently the low value is set at about 1% of all possible random numbers.

The paper suggests seeding using random hexadecimal digits:
{noformat}
"The digits are chosen so that the upper 8 are different and the lower 8 are
different and then 1 is ored into the result to make the constant odd."
{noformat}
The example source code provides a method to do this. The source code is GPL3 
so cannot be copied unless we get permission. But the method is described in 
the paper so could be re-implemented. The seeding method uses an internal 
random generator to create the seed for the Weyl sequence. So seeding is slow 
relative to just checking for complexity in the provided seed and boosting it 
if necessary.

Note that the original code provides a set of 25000 example seeds using their 
method. This can be analysed for bit transitions.

The histogram looks like this:
{noformat}
Transitions  Count
19           2
20           0
21           18
22           0
23           129
24           0
25           632
26           0
27           1918
28           0
29           4083
30           0
31           5812
32           0
33           5770
34           0
35           3971
36           0
37           1951
38           0
39           573
40           0
41           113
42           0
43           25
44           0
45           3
{noformat}
Note that the method creates no seeds with an odd number of state transitions 
and no seeds with less than 19 transitions. So using a cut-off for upper and 
lower 32-bits of 9 transitions is roughly equivalent.

Since the seeding routine ensures all 8 hex digits in the upper and lower 
32-bits are unique the output seed will be assured of altering each 4 bit 
section of the Weyl sequence with each iteration (with the exception of the use 
of the 0x0 hex digit).

The author's have also verified that their seeding routine is suitable for 
parallel computation as the first 3 billion input values starting from 0 
produce unique seeds.

Ideally I would like to avoid a complex seeding routine inside the core 
generator implementation. Using the simple method of boosting the number of bit 
transitions is working.

If the generator is to be used for parallel computations then a user can create 
n generators and just check that each input seed is unique. They can even be 
referred to the original paper to produce seeds using that code. Currently my 
routine would modify 30 of those 25000 example seeds (0.12%). The lowest 
transition count found in the 32-bits is 6 which happens because the final bit 
is forced to 1 (note the double 0xf at the end).
{noformat}
0xc  0x3  0x8  0x0  0x6  0x7  0xf  0xf
1100 0011 1000 0000 0110 0111 1111 1111
{noformat}
To avoid the modified input seed creating an non unique generator the 
unmodified seed can be used to set the initial state of the generator. Thus all 
possible random seeds should create a unique generator, although the sequences 
may overlap if the input seed has been modified.

Since we do not currently support construction of parallel generators that do 
not overlap via the factory method in {{RandomSource.create}} I do not think it 
a priority to exactly implement the original source seeding routine.

 

> Middle Square Weyl Sequence generator
> -------------------------------------
>
>                 Key: RNG-85
>                 URL: https://issues.apache.org/jira/browse/RNG-85
>             Project: Commons RNG
>          Issue Type: Sub-task
>          Components: core
>            Reporter: Gilles
>            Priority: Minor
>              Labels: gsoc2019
>
> https://en.wikipedia.org/wiki/Middle_square_method#Middle_Square_Weyl_Sequence_PRNG



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to