[ 
https://issues.apache.org/jira/browse/RNG-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517467#comment-17517467
 ] 

Alex Herbert commented on RNG-174:
----------------------------------

I have updated the o.a.c.rng.simple.internal package to support checking a 
range of the seed is not all zero. This updates RandomSourceInternal and 
SeedFactory.

Only one new method is public and one change has occurred to an existing public 
method to change from abstract to an implementation (it just calls the new 
method). Here is the JApiCmp report:
{noformat}
Comparing source compatibility of commons-rng-simple-1.5-SNAPSHOT.jar against 
commons-rng-simple-1.4.jar
**** MODIFIED ENUM: PUBLIC ABSTRACT 
org.apache.commons.rng.simple.internal.NativeSeedType  (compatible)
        ===  CLASS FILE FORMAT VERSION: 52.0 <- 52.0
        ***  MODIFIED METHOD: PUBLIC NON_ABSTRACT (<- ABSTRACT) 
java.lang.Object createSeed(int)
        +++* NEW METHOD: PUBLIC(+) ABSTRACT(+) java.lang.Object createSeed(int, 
int, int)
{noformat}
The existing method createSeed in NativeSeedType is public so adding this new 
method as public is consistent. It could be changed to package-private. This 
class is only used internally and could be entirely package-private. That may 
not have been the case when it was created for version 1.3 but I cannot 
remember and have not checked the commit history.

The new internal seeding routines holds a sub-range of the seed that cannot be 
all zero. Work was previously done to avoid creating all zero seeds as some 
generators are sensitive to them. This used the simple approach to ensure the 
first position in array seeds is non zero. I have used the previous tests to 
identify the generators that require non-zero seeds. See
 * o.a.c.rng.simple.ProvidersCommonParametricTest.testZeroIntArraySeed
 * o.a.c.rng.core.RandomAssert.assertNextIntZeroOutput
 * o.a.c.rng.core.RandomAssert.assertNextLongZeroOutput
 * 
o.a.c.rng.core.RandomAssert.assertIntArrayConstructorWithSingleBitInPoolIsFunctional

Any generator identified from these tests requires a non-zero seed. In most 
cases this was set as the full seed length, or one less for generators that do 
not use all the bits of the seed array (WELL_19937_x,
WELL_44497_x).

Notable exceptions:

The KISS generator is reduced to a simple LCG when positions [0, 3) are all 
zero. I added a test to demonstrate this. With a zero seed the KISS LCG passes 
testZeroIntArraySeed. However output will be a 32-bit LCG. To avoid a poor 
generator the seed will be checked to be non-zero in the range [0, 3). This 
prevents the KISS generator reducing to a LCG. It is consistent with checking 
range [0, 1) in previous versions of the library.

The MSWS generator is sensitive to the initial state. I added a test to show 
that a zero seed creates zero output. Updating RandomAssert to add an 
assertLongArrayConstructorWithSingleBitInPoolIsFunctional test shows the MSWS 
fails with single bit seeds. This generator is the most sensitive in the 
library to poor seeding. It has a seed length of 3. The final position must be 
a good increment for a Weyl sequence. It should definitely not be zero. The 
second position is the initial state of the Weyl sequence. This could be zero. 
The first position is generator state. If not very random, and the Weyl 
increment is poor, then this state can take a long time to attain randomness 
for the output. The behaviour from v1.3 would be to set the first position as 
non-zero. However randomness can best be achieved through a good Weyl 
increment. It makes more sense to ensure position 3 is non-zero.

However it is still possible to create a generator that will output zeros for a 
large number of cycles. So despite the native seed type being a long[] of 
length 3, it would be recommended to create this generator with a single long 
value and have RandomSource.MSWS create an appropriately seeded generator. An 
alternative is to provide a source of randomness to create a byte[] seed. This 
can use more entropy than the 64-bits of a long to create more possible seeds. 
The method was fixed in RNG-175 to be robust to bad sources of randomness.
h2. Changes

This change is summarised below for all sources that ensured a seed was 
non-zero in position 0 in their native array seed.
||RandomSource||Type||Length||From (inclusive)||To (exclusive)||Notes||
|WELL_512_A|int[]|16|0|16| |
|WELL_1024_A|int[]|32|0|32| |
|WELL_19937_A|int[]|624|0|623|Does not use all bits from the final seed 
position|
|WELL_19937_C|int[]|624|0|623|Does not use all bits from the final seed 
position|
|WELL_44497_A|int[]|1391|0|1390|Does not use all bits from the final seed 
position|
|WELL_44497_B|int[]|1391|0|1390|Does not use all bits from the final seed 
position|
|MT|int[]|624|0|0|Not sensitive to all-zero seeds|
|ISAAC|int[]|256|0|0|Not sensitive to all-zero seeds|
|XOR_SHIFT_1024_S|long[]|16|0|16| |
|MT_64|long[]|312|0|0|Not sensitive to all-zero seeds|
|MWC_256|int[]|257|0|257| |
|KISS|int[]|4|0|3|Last position is a LCG state and can be zero.|
|XOR_SHIFT_1024_S_PHI|long[]|16|0|16| |
|XO_RO_SHI_RO_64_S|int[]|2|0|2| |
|XO_RO_SHI_RO_64_SS|int[]|2|0|2| |
|XO_SHI_RO_128_PLUS|int[]|4|0|4| |
|XO_SHI_RO_128_SS|int[]|4|0|4| |
|XO_RO_SHI_RO_128_PLUS|long[]|2|0|2| |
|XO_RO_SHI_RO_128_SS|long[]|2|0|2| |
|XO_SHI_RO_256_PLUS|long[]|4|0|4| |
|XO_SHI_RO_256_SS|long[]|4|0|4| |
|XO_SHI_RO_512_PLUS|long[]|8|0|8| |
|XO_SHI_RO_512_SS|long[]|8|0|8| |
|PCG_XSH_RR_32|long[]|2|0|0|Not sensitive to all-zero seeds|
|PCG_XSH_RS_32|long[]|2|0|0|Not sensitive to all-zero seeds|
|PCG_RXS_M_XS_64|long[]|2|0|0|Not sensitive to all-zero seeds|
|MSWS|long[]|3|2|3|Changed to target the Weyl increment as non-zero|
|SFC_32|int[]|3|0|0|Not sensitive to all-zero seeds|
|SFC_64|long[]|3|0|0|Not sensitive to all-zero seeds|
|XO_SHI_RO_128_PP|int[]|4|0|4| |
|XO_RO_SHI_RO_128_PP|long[]|2|0|2| |
|XO_SHI_RO_256_PP|long[]|4|0|4| |
|XO_SHI_RO_512_PP|long[]|8|0|8| |
|XO_RO_SHI_RO_1024_PP|long[]|16|0|16| |
|XO_RO_SHI_RO_1024_S|long[]|16|0|16| |
|XO_RO_SHI_RO_1024_SS|long[]|16|0|16| |
h2. Functional Changes

In most use cases the change will have no functional incompatibility. Default 
seeding uses an internal source of randomness. The change modifies how this 
internal source was applied to create a generator. The generator created by 
RandomSource should have an initial random state and produce quality output 
(i.e. not all zeros).

However this change introduces functionally breaking changes to the method in 
RandomSource that accepts an input source of randomness:
{code:java}
byte[] createSeed(UniformRandomProvider);{code}
Previously the method would generate the native array seed, ensure it was 
non-zero in position 0, and convert the seed to bytes. If the seed was zero in 
position 0 then the _input provider was ignored_ and a value was generated from 
the _default source of randomness_ in the SeedFactory. This made the method 
non-reproducible.

The method has been updated to create the native seed as before, then check the 
sub-range in the table above is non-zero. If all zero in the sub-range then the 
sub-range is filled using a robust RNG seeded from the provided input 
UniformRandomProvider. The fill will ensure not all bits are zero in the 
sub-range. The default source of randomness is not used. The method is now 
reproducible. The same UniformRandomProvider will create the same seed, even if 
the initial seed has a sub-range that is all zero.

A test has been added to show that the seed created from a source of randomness 
that outputs all zeros with create a functional generator for all RandomSource 
value; and that the seed created is reproducible.

Functional changes summary:
 * In the common use case for the method, the source of randomness to the 
method will be random. The output will be functionally identical, a new random 
seed is produced.
 * In the uncommon use case for the method, the source of randomness is fixed 
and happens to avoid a native seed with a zero in the first position. This 
behaviour is unchanged except for the MSWS where the seed may be different if 
it had a zero in position [2] of the long[] seed.
 * In the very uncommon use case for the method, the source of randomness is 
fixed and happens to create a native seed with a zero in the first position. 
This occurs with a frequency of 1 in 2^32 or 1 in 2^64. These cases will have a 
functionally breaking change in the byte[] seed created by the method. Previous 
behaviour would generate a different random seed each call. New behaviour will 
generate a (possibly different) random seed; the seed will be identical for 
each call.

Given that previous behaviour for edge cases of zero seeds would have generated 
random seeds, this change should not effect users.

Seeding for the MSWS has been improved to be more robust. Users generating 
fixed seeds for this generator should check the seed is suitable, and 
regenerate it with the new routines if required.

 

> Improve support for non-zero seeds
> ----------------------------------
>
>                 Key: RNG-174
>                 URL: https://issues.apache.org/jira/browse/RNG-174
>             Project: Commons RNG
>          Issue Type: Improvement
>          Components: simple
>    Affects Versions: 1.4
>            Reporter: Alex Herbert
>            Assignee: Alex Herbert
>            Priority: Minor
>             Fix For: 1.5
>
>
> The default seed arrays created by RandomSource are ensured to be non-zero in 
> the first position. This is to support xor-based generators which are 
> non-functional when seeded with all zeros.
> All xor-based generators in the library fill their state from position 0 in 
> the input seed array. So this has worked for all current implementations.
> The new LXM family of generators have a composite seed of the state of a 
> linear congruential generator (LCG) and the state of a xor-based generator 
> (XBG). Ideal seeding for these generators places the LCG state first. This is 
> due to the behaviour of the LXM family where the seeding of the LCG can 
> create independent streams of RNG output, specifically when using a different 
> LCG add parameter (which must be odd). Thus seeding with values 1, 3, 5, 7, 
> which are then expanded into a full array, will create non-overlapping RNG 
> sequences.
> The requirement to place the LCG state first in the seed shifts the seed for 
> the XBG state. It is possible that a generated seed would be all zero in the 
> XBG state. The current seed generator is 16-equidistributed and can thus 
> output consecutive zeros. The RandomSource seeding behaviour should be 
> updated with the option to create a seed which is non-zero in a specified 
> range of the seed array.
> The public API in RandomSource is:
> {code:java}
> byte[] createSeed();
> byte[] createSeed(UniformRandomProvider rng);
> static int[] createIntArray(int n);
> static long[] createLongArray(int n);{code}
> The createSeed methods are specific to each RandomSource instance. This is 
> delegated to an internal package which creates a native seed of the correct 
> length and converts it to bytes. No changes to the public API should be 
> required to support non-zero seeds in a range.
> Note that the seed generation method is also used by:
> {code:java}
> RestorableUniformRandomProvider create(); {code}
> So any LXM generator created by the RandomSource enum with no explicit seed 
> will also obtain this functionality.
> For the array generation methods, these have no documentation on the non-zero 
> behaviour. Either these methods can be left alone, or updated to add a range:
> {code:java}
> int[] RandomSource.createIntArray(int n, int from, int to);
> long[] RandomSource.createLongArray(int n, int from, int to);
> {code}
> In the interest of simplicity, and given that createSeed() is the preferred 
> method for a known RandomSource, additional overloads of these methods can be 
> omitted.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to