[
https://issues.apache.org/jira/browse/RNG-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alex Herbert resolved RNG-181.
------------------------------
Resolution: Implemented
Added in commit:
0021eb92a56dd5dc61e63256fb2a30c316c7d8af
> LXM generators to implement SplittableUniformRandomProvider
> -----------------------------------------------------------
>
> Key: RNG-181
> URL: https://issues.apache.org/jira/browse/RNG-181
> Project: Commons RNG
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.5
> Reporter: Alex Herbert
> Assignee: Alex Herbert
> Priority: Major
> Fix For: 1.5
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> The LXM family of generators use a linear congruential generator (LCG) mixed
> with a xor-based generator. The paper introducing these generators indicates
> that these are suitable for splitting as the streams are demonstrated to be
> independent when the addition parameter for the LCG is different. The authors
> provide results for outputting the combined (interleaved) sequence from up to
> 2^24 generators with different LCG add parameters through Test U01 BigCrush
> with no failures.
> These generators are splittable in the JDK 17 java.util.random package.
> Splitting functionality should be added to the family in Commons RNG.
> A single split of the generator is made by randomly seeding a new generator
> using the existing one. This is no different than simply creating a new
> generator from any source of randomness.
> When the generator is used to make a Stream<SplittableUniformRandomProvider>
> then each stream instance can be created using a different addition
> parameter. This can be done by exploiting the fact that the spliterator for
> the stream maintains the stream position as a count in [0, streamSize). The
> addition parameter can be created by using this count. To ensure a second
> stream will have different seeding the count can be combined with random bits
> that change for each stream instance.
> h2. JDK Implementation
> The exact mechanism used in the JDK is not documented (in the public javadoc)
> so may be subject to change. The present source code adds the stream position
> bits to random bits to create the addition. The seed random bits are
> generated with the some of the lowest bits set to 1. The addition parameter
> is created by shifting the seed random addition left until the lowest 1-bit
> is above the highest 1-bit of the position. The two can be combined (with
> xor) to create an addition which will be unique among the stream.
> The present implementation in the JDK creates the initial seed as a series of
> n-bit characters using a seed generation algorithm; the lowest n-bits are set
> to 1 to act as a marker for the end of the seed. The algorithm to create the
> initial seed characters uses the overflow from a multiply congruential
> generator (MCG) with m=15 to create the digits as 4-bits. The source code
> indicates that the character size could be any supported number in [2, 63].
> However the more bits that are set to mark the lower position of the seed
> impacts the amount of randomness left in the remaining bits.
> The initial seed characters are created using Math.multiplyHigh which is
> signed. The result is that the digits in the seed have a non-uniform
> distribution as the overflow character is subject to sign extension
> correction in the multiply. If the same algorithm uses JDK 18s
> Math.unsignedMultiplyHigh then the digits have a uniform distribution.
> Since the intrinsic functions Math.multiplyHigh and Math.unsignedMultiplyHigh
> are not available in JDK 8 then a similar approach using a MCG would be slow.
> Details to take away are that the counter and initial random seed are
> combined. Overlap of the seed and the counter are avoided by shifting the
> seed left. Some bits are used as a marker in the seed which reduces the
> randomness. Ideally the initial seed should have randomly distributed bits;
> this is not the case in the JDK 17 implementation due to the generation
> algorithm used.
> h2. Proposed Implementation
> To create a random addition combine the stream position with random bits.
> Notes:
> # The stream position will be a positive long. However the addition for the
> LCG must be odd. This is done in the LXM family by setting the lowest 1 bit
> to 1. Thus the count must be shifted left 1-bit to avoid duplications in the
> seed sequence. This removes the ability to add extra bits to seed additions
> when the stream size exceeds 2^63. Such large streams are not expected in
> common use.
> # A counter that is enumerated from 0 to 2^n - 1 will be uniformly
> distributed for each bit in the first n-bits. However when used as an
> addition to a sum in an LCG all of these possible additions may not trickle
> up changes very far into higher bits. 1-bits must be present in higher
> positions of the addition to effect higher bits in the sum. If all the higher
> bits are the same then the trickle up effect will be similar. So it is an
> advantage to have the seed counter combined with different random bits.
> A random seed could be xor'd with the counter. However the upper bits for all
> additions would be the same above the highest 1-bit in the sequence size.
> This reduces the variation in the addition parameter for use by the LXM
> generator.
> Thus is makes sense to create a random seed, then shift it left as the
> counter increases in magnitude. Note that the shift to avoid overlap with the
> counter could be maintained using a separate shift counter. However if the
> initial random seed bits have many zeros in the low bits it may be possible
> to left shift this and have nothing to combine with a large value counter.
> This may lead to duplications of previous ((seed << shift) | count) values as
> the counter increases in magnitude. If there is always at least 1 bit to
> prepend to the counter then no duplications will occur as any larger counts
> that match a previous ((seed << shift) | count) will also have at least 1 bit
> prepended. This limits the seed addition to 63-bits of randomness.
> Since the lowest 1-bit is enabled for the LCG add parameter this leaves
> 62-bits of randomness to seed the addition parameter. A stream with more than
> 2^62 elements would not be able to ensure a unique addition parameter for
> each RNG in the stream. Such streams are unlikely to be used. A more
> realistic parallel stream size of less than 1024 generators would only
> consume 9 bits for the counter and the seed addition will contain at least 53
> bits of randomness accounting for the two bits lost to the marker bit and the
> odd LCG add parameter.
> h2. Code changes
> This functionality can be introduced in a new abstract base class for
> splittable generators that can split using a seed value. The seed value would
> be ensured to be unique within a stream (subject to the size limitations
> above). The current LXM generators extend either IntProvider or LongProvider.
> The LongProviders all use a 64 or 128-bit LCG and can thus use the entire
> seed value when splitting. The IntProvider is the L32X64Mix generator which
> uses a 32-bit LCG. This can only use 31-bits from the seed for the LCG add
> parameter. It limits the number of unique add parameters within a stream to
> 2^30 before duplications may be observed. This generator is splittable in the
> JDK. It is used as the default generator in JDK 17 for
> RandomGenerator.getDefault(). Supporting splitting in Commons RNG provides
> parity with the modern JDK random implementation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)