[ 
https://issues.apache.org/jira/browse/RNG-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Herbert resolved RNG-181.
------------------------------
    Resolution: Implemented

Added in commit:

0021eb92a56dd5dc61e63256fb2a30c316c7d8af

> LXM generators to implement SplittableUniformRandomProvider
> -----------------------------------------------------------
>
>                 Key: RNG-181
>                 URL: https://issues.apache.org/jira/browse/RNG-181
>             Project: Commons RNG
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.5
>            Reporter: Alex Herbert
>            Assignee: Alex Herbert
>            Priority: Major
>             Fix For: 1.5
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The LXM family of generators use a linear congruential generator (LCG) mixed 
> with a xor-based generator. The paper introducing these generators indicates 
> that these are suitable for splitting as the streams are demonstrated to be 
> independent when the addition parameter for the LCG is different. The authors 
> provide results for outputting the combined (interleaved) sequence from up to 
> 2^24 generators with different LCG add parameters through Test U01 BigCrush 
> with no failures.
> These generators are splittable in the JDK 17 java.util.random package. 
> Splitting functionality should be added to the family in Commons RNG.
> A single split of the generator is made by randomly seeding a new generator 
> using the existing one. This is no different than simply creating a new 
> generator from any source of randomness.
> When the generator is used to make a Stream<SplittableUniformRandomProvider> 
> then each stream instance can be created using a different addition 
> parameter. This can be done by exploiting the fact that the spliterator for 
> the stream maintains the stream position as a count in [0, streamSize). The 
> addition parameter can be created by using this count. To ensure a second 
> stream will have different seeding the count can be combined with random bits 
> that change for each stream instance.
> h2. JDK Implementation
> The exact mechanism used in the JDK is not documented (in the public javadoc) 
> so may be subject to change. The present source code adds the stream position 
> bits to random bits to create the addition. The seed random bits are 
> generated with the some of the lowest bits set to 1. The addition parameter 
> is created by shifting the seed random addition left until the lowest 1-bit 
> is above the highest 1-bit of the position. The two can be combined (with 
> xor) to create an addition which will be unique among the stream. 
> The present implementation in the JDK creates the initial seed as a series of 
> n-bit characters using a seed generation algorithm; the lowest n-bits are set 
> to 1 to act as a marker for the end of the seed. The algorithm to create the 
> initial seed characters uses the overflow from a multiply congruential 
> generator (MCG) with m=15 to create the digits as 4-bits. The source code 
> indicates that the character size could be any supported number in [2, 63]. 
> However the more bits that are set to mark the lower position of the seed 
> impacts the amount of randomness left in the remaining bits.
> The initial seed characters are created using Math.multiplyHigh which is 
> signed. The result is that the digits in the seed have a non-uniform 
> distribution as the overflow character is subject to sign extension 
> correction in the multiply. If the same algorithm uses JDK 18s 
> Math.unsignedMultiplyHigh then the digits have a uniform distribution.
> Since the intrinsic functions Math.multiplyHigh and Math.unsignedMultiplyHigh 
> are not available in JDK 8 then a similar approach using a MCG would be slow.
> Details to take away are that the counter and initial random seed are 
> combined. Overlap of the seed and the counter are avoided by shifting the 
> seed left. Some bits are used as a marker in the seed which reduces the 
> randomness. Ideally the initial seed should have randomly distributed bits; 
> this is not the case in the JDK 17 implementation due to the generation 
> algorithm used.
> h2. Proposed Implementation
> To create a random addition combine the stream position with random bits.
> Notes:
>  # The stream position will be a positive long. However the addition for the 
> LCG must be odd. This is done in the LXM family by setting the lowest 1 bit 
> to 1. Thus the count must be shifted left 1-bit to avoid duplications in the 
> seed sequence. This removes the ability to add extra bits to seed additions 
> when the stream size exceeds 2^63. Such large streams are not expected in 
> common use.
>  # A counter that is enumerated from 0 to 2^n - 1 will be uniformly 
> distributed for each bit in the first n-bits. However when used as an 
> addition to a sum in an LCG all of these possible additions may not trickle 
> up changes very far into higher bits. 1-bits must be present in higher 
> positions of the addition to effect higher bits in the sum. If all the higher 
> bits are the same then the trickle up effect will be similar. So it is an 
> advantage to have the seed counter combined with different random bits.
> A random seed could be xor'd with the counter. However the upper bits for all 
> additions would be the same above the highest 1-bit in the sequence size. 
> This reduces the variation in the addition parameter for use by the LXM 
> generator.
> Thus is makes sense to create a random seed, then shift it left as the 
> counter increases in magnitude. Note that the shift to avoid overlap with the 
> counter could be maintained using a separate shift counter. However if the 
> initial random seed bits have many zeros in the low bits it may be possible 
> to left shift this and have nothing to combine with a large value counter. 
> This may lead to duplications of previous ((seed << shift) | count) values as 
> the counter increases in magnitude. If there is always at least 1 bit to 
> prepend to the counter then no duplications will occur as any larger counts 
> that match a previous ((seed << shift) | count) will also have at least 1 bit 
> prepended. This limits the seed addition to 63-bits of randomness.
> Since the lowest 1-bit is enabled for the LCG add parameter this leaves 
> 62-bits of randomness to seed the addition parameter. A stream with more than 
> 2^62 elements would not be able to ensure a unique addition parameter for 
> each RNG in the stream. Such streams are unlikely to be used. A more 
> realistic parallel stream size of less than 1024 generators would only 
> consume 9 bits for the counter and the seed addition will contain at least 53 
> bits of randomness accounting for the two bits lost to the marker bit and the 
> odd LCG add parameter.
> h2. Code changes
> This functionality can be introduced in a new abstract base class for 
> splittable generators that can split using a seed value. The seed value would 
> be ensured to be unique within a stream (subject to the size limitations 
> above). The current LXM generators extend either IntProvider or LongProvider. 
> The LongProviders all use a 64 or 128-bit LCG and can thus use the entire 
> seed value when splitting. The IntProvider is the L32X64Mix generator which 
> uses a 32-bit LCG. This can only use 31-bits from the seed for the LCG add 
> parameter. It limits the number of unique add parameters within a stream to 
> 2^30 before duplications may be observed. This generator is splittable in the 
> JDK. It is used as the default generator in JDK 17 for 
> RandomGenerator.getDefault(). Supporting splitting in Commons RNG provides 
> parity with the modern JDK random implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to