[jira] [Commented] (RNG-57) CachedUniformRandomProvider for nextBoolean() and nextInt()

Alex D Herbert (JIRA) Tue, 09 Oct 2018 02:56:09 -0700


    [ 
https://issues.apache.org/jira/browse/RNG-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643062#comment-16643062
 ]


Alex D Herbert commented on RNG-57:
-----------------------------------

bq. Did I understand correctly that you propose to

Yes. 

The test suite should satisfy the aims:

1. Demonstrate each implementation is a valid Java port (if it is a port)
2. Demonstrate each implementation is an 'acceptable' source of randomness for 
int/long
3. Demonstrate each implementation can be saved and restored, no matter what 
the initial seed and current state
4. Demonstrate a source of randomness for int/long can be used to produce 
randomness for boolean, float, double and a range of int/long (i.e. the 
UniformRandomProvider interface)
5. Exercise all the code paths and expected edge case behaviour
6. Leave thorough statistical analysis of the randomness of each provider to 
the Big Crush test

IMO the current test suite does this except for using random seeds to create 
the providers. However it over tests certain parts of the code due to 
repetition and due to the nature of the the tests this results in a high 
overall failure rate for the entire test suite.

Currently each provider has a unit test that demonstrates the Java port of the 
algorithm is functioning as per the original source. 

Then each algorithm is used to provide either {{int}} or {{long}} as the base 
of all randomness.

So unit test the base implementation of UniformRandomProvider for int/long 
once. This should be done using a very random source such as the strongest 
cryptographically secure algorithm in SecureRandom.

Then unit test the construction edge cases and save/restore functionality of 
each provider. This can use random seeds to ensure a robust test suite. These 
tests should never fail.

Then finally test the nextInt() and nextLong() methods of each provider only 
once. This can be created to have a known fail rate that is acceptable. E.g. 
use a single Chi squared test with a known fail rate, e.g. 1%. Or do repeated 
Chi squared test on small samples and then test the results follow a Binomial 
distribution with a p of 0.01. For simplicity a single Chi squared test of a 
long run of the provider sequence would be my choice.

The test for nextInt or nextLong can either compress all the bits into bytes 
for a 256 bin histogram for the Chi square test or use a 1024 bin histogram for 
int (4 * 256 bins with each set coming from a 8-bit block of the integer) and 
2048 bins for long.

Currently the tests for nextInt/Long/Float/Double use 1000 samples repeated 500 
times in 19 individual test runs. That is 9,500,000 samples. This could be 
compressed into a single test. If this was set to 1,024,000 samples a uniform 
distribution over 2048 bins would have 500 samples per bin. This would run 9 
times faster than the current test suite and should fail at a known rate, e.g. 
1%. Allowing each test to be repeated 1 time by JUnit in event of failure will 
then cause the fail rate to drop to (0.01)^2, or 0.01% of the time. 

With 16 providers the likelihood of a failure of all providers is low and the 
likelihood of a pass of all 16 is:

{noformat}
(1 - 0.0001)^16 = 0.998
{noformat}

So if failed tests can be repeated just once you will see Travis jobs failing 
0.2% of the time. This assumes each RNG provider is perfect and will fail 
exactly 1% of the time.

I'd then suggest some data collation of failures be performed over a long 
period. Each RNG should fail 1% of the time but this can be examined when the 
suite has been run a few thousand times by Travis. Such data can be collated 
periodically and could form a page for the user guide.





> CachedUniformRandomProvider for nextBoolean() and nextInt()
> -----------------------------------------------------------
>
>                 Key: RNG-57
>                 URL: https://issues.apache.org/jira/browse/RNG-57
>             Project: Commons RNG
>          Issue Type: Improvement
>          Components: sampling
>    Affects Versions: 1.2
>            Reporter: Alex D Herbert
>            Priority: Minor
>              Labels: performance
>
> Implement a wrapper around a {{UniformRandomProvider}} that can cache the 
> underlying source of random bytes for use in the methods {{nextBoolean()}} 
> and {{nextInt()}} (in the case of {{LongProvider}}). E.g.
> {code:java}
> LongProvider provider = RandomSource.create(RandomSource.SPLIT_MIX_64);
> CachedLongProvider rng = new CachedLongProvider(provider);
> // Uses cached nextLong() 64 times
> rng.nextBoolean();
> // Uses cached nextLong() twice
> rng.nextInt();
> IntProvider provider = RandomSource.create(RandomSource.KISS);
> CachedIntProvider rng2 = new CachedIntProvider(provider);
> // Uses cached nextInt() 32 times
> rng2.nextBoolean();
> // This could be wrapped by a factory method:
> UniformRandomProvider rng = CachedUniformRandomProviderFactory.wrap(
>         // Any supported source: IntProvider or LongProvider
>         RandomSource.create(RandomSource...));
> {code}
> The implementation should be speed tested to determine the benefit for 
> {{nextBoolean()}} and if {{nextInt()}} can be improved for {{LongProviders}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (RNG-57) CachedUniformRandomProvider for nextBoolean() and nextInt()

Reply via email to