[
https://issues.apache.org/jira/browse/RNG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814611#comment-16814611
]
Alex D Herbert commented on RNG-85:
-----------------------------------
I've put together an implementation of this generator without using the
original code to do the seeding.
To check it against a reference implementation I currently have the RNG
constructor accept a single {{long}}. This is used to create the increment for
the Weyl sequence. The only condition for this input is that is an odd number.
However it helps if this increment has good randomness in the bits (more on
this later). So the input seed is checked for complexity using a method similar
to {{SplittableRandom}} (which is also based on a Weyl sequence). The check
counts the number of transitions in the seed from 0 to 1 or vice versa.
{code:java}
static int countTransitions(long value) {
// Sign-extending shift properly counts bits at the ends
return Long.bitCount(value ^ (value >> 1));
}
{code}
If it passes a critical value then the value goes through untouched. So it can
be seeded using any of the example seeds in the original paper or source code
which have good complexity.
If the seed is low complexity then the complexity is boosted by {{xor}} ing the
seed with an alternating bit mask {{0xaaaaaaaa}} for the upper and lower
32-bits separately. The SplittableRandom defines complexity as less than 24
transitions in the 64-bit seed. I have modelled the possible transitions as a
Binomial distribution (BD) with n the number of changes and p set to 0.5.
Dividing the seed into upper and lower bits this is then modelled using a
BD(n=31,p=0.5). This has a mean of np which is 15.5. The cumulative probabilty
is readily computed. The cut-off of 24 used in the SplittableRandom corresponds
to approximately 1.6% of possible values. (Note: This value is computed
empirically from a distribution of random numbers as the SplittableRandom uses
an unsigned bit shift when computing transitions leading to overcounting by
approximately 0.5. As such it cannot be modelled using a binomial
distribution.) I have set the threshold as 0.5% of possible values in the upper
and lower 32-bit separately. This requires at least 9 transitions. The result
is approximately 1% of seeds are not good enough and must be adjusted.
This allows it to pass the tests in Core and Simple modules and a quick run
using 8 trials passes SmallCrush with no failures. So this seeding approach is
working.
JMH results verses SplitMix:
||randomSourceName||Method||Score||Error||Median||Error/Score||
|BASELINE|nextDouble|2.657|0.339|2.587|0.128|
|MSWS|nextDouble|5.312|0.006|5.311|0.001|
|SPLIT_MIX_64|nextDouble|4.539|0.013|4.537|0.003|
| |baselineDouble|2.349|0.305|2.287|0.130|
| |baselineVoid|0.269|0.006|0.267|0.022|
|BASELINE|nextInt|2.363|0.002|2.363|0.001|
|MSWS|nextInt|3.209|0.003|3.208|0.001|
|SPLIT_MIX_64|nextInt|3.581|0.044|3.572|0.012|
| |baselineInt|2.045|0.007|2.043|0.003|
| |baselineVoid|0.266|0.000|0.266|0.000|
Note using these results you can subtract the baseline from each score since it
measures the overhead within JMH for testing methods that produce a primitive
value.
So not as fast for double generation but much faster for int generation.
Regarding the seeding.
The MSWS paper states that:
{noformat}
"The constant s should be an irregular bit pattern with roughly half of the
bits set to one and half of the bits set to zero. The best statistics have been
obtained with this type of pattern. Unduly sparse or dense values do not
produce good results."
{noformat}
So we could just seed the generator using any random number, as long as the
number of bit transitions is not too low. Defining the level for low is vague
and currently the low value is set at about 1% of all possible random numbers.
The paper suggests seeding using random hexadecimal digits:
{noformat}
"The digits are chosen so that the upper 8 are different and the lower 8 are
different and then 1 is ored into the result to make the constant odd."
{noformat}
The example source code provides a method to do this. The source code is GPL3
so cannot be copied unless we get permission. But the method is described in
the paper so could be re-implemented. The seeding method uses an internal
random generator to create the seed for the Weyl sequence. So seeding is slow
relative to just checking for complexity in the provided seed and boosting it
if necessary.
Note that the original code provides a set of 25000 example seeds using their
method. This can be analysed for bit transitions.
The histogram looks like this:
{noformat}
Transitions Count
19 2
20 0
21 18
22 0
23 129
24 0
25 632
26 0
27 1918
28 0
29 4083
30 0
31 5812
32 0
33 5770
34 0
35 3971
36 0
37 1951
38 0
39 573
40 0
41 113
42 0
43 25
44 0
45 3
{noformat}
Note that the method creates no seeds with an odd number of state transitions
and no seeds with less than 19 transitions. So using a cut-off for upper and
lower 32-bits of 9 transitions is roughly equivalent.
Since the seeding routine ensures all 8 hex digits in the upper and lower
32-bits are unique the output seed will be assured of altering each 4 bit
section of the Weyl sequence with each iteration (with the exception of the use
of the 0x0 hex digit).
The author's have also verified that their seeding routine is suitable for
parallel computation as the first 3 billion input values starting from 0
produce unique seeds.
Ideally I would like to avoid a complex seeding routine inside the core
generator implementation. Using the simple method of boosting the number of bit
transitions is working.
If the generator is to be used for parallel computations then a user can create
n generators and just check that each input seed is unique. They can even be
referred to the original paper to produce seeds using that code. Currently my
routine would modify 30 of those 25000 example seeds (0.12%). The lowest
transition count found in the 32-bits is 6 which happens because the final bit
is forced to 1 (note the double 0xf at the end).
{noformat}
0xc 0x3 0x8 0x0 0x6 0x7 0xf 0xf
1100 0011 1000 0000 0110 0111 1111 1111
{noformat}
To avoid the modified input seed creating an non unique generator the
unmodified seed can be used to set the initial state of the generator. Thus all
possible random seeds should create a unique generator, although the sequences
may overlap if the input seed has been modified.
Since we do not currently support construction of parallel generators that do
not overlap via the factory method in {{RandomSource.create}} I do not think it
a priority to exactly implement the original source seeding routine.
> Middle Square Weyl Sequence generator
> -------------------------------------
>
> Key: RNG-85
> URL: https://issues.apache.org/jira/browse/RNG-85
> Project: Commons RNG
> Issue Type: Sub-task
> Components: core
> Reporter: Gilles
> Priority: Minor
> Labels: gsoc2019
>
> https://en.wikipedia.org/wiki/Middle_square_method#Middle_Square_Weyl_Sequence_PRNG
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)