[ 
https://issues.apache.org/jira/browse/RNG-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778167#comment-16778167
 ] 

Alex D Herbert commented on RNG-72:
-----------------------------------

My typical use case for a fast construction would be a shuffle of an array or 
collection, or perhaps a 1 time-sample from a distribution. In this case it 
would be nice that {{RandomSource.create}} is not much slower than calling the 
constructor of a chosen implementing class. So a user can update the code to 
switch to a different RNG by just changing the enum value rather than having to 
rewrite for the constructor. The enum can also be chosen dynamically, e.g. 
using program arguments, if the factory methods are employed.

I think that some improvements can be made without changing the {{Well44497b}} 
generator used in the {{SeedFactory}}. I will put the non-destructive 
improvements into a PR and post the benchmark results.

The {{SeedFactory}} currently uses synchronisation around the generator when 
creating {{int[]}} or {{long[]}} seeds. It also uses 
{{System.identityHashCode(new Object())}} mixed with the output from the 
generator to add randomness. Perhaps this is not necessary for a long period 
generator. I will investigate removing synchronisation, changing the base 
generator and not using the hashcode method. I think a change of {{Well44497b}} 
to a {{XorShift1024Star}} should see a big speed improvement even if the 
synchronisation is left. Dropping to use a SplitMix for seeding arrays should 
see another boost. I propose to try:
||Seed RNG||With Hashcode||Array Seeds||
|Synchronised Well44497b|Y|2^44497 * 2^32|
|Synchronised Well44497b|N|2^44497|
|Synchronised XorShift1024Star|Y|2^1024 * 2^32|
|Synchronised XorShift1024Star|N|2^1024|
|Synchronised XorShift1024Star to create local SplitMix|Y|2^64 * 2^32|
|Synchronised XorShift1024Star to create local SplitMix|N|2^64|

This should indicate where the most performance benefit will occur. The use of 
the hashcode does not add much randomness to the possible seeds of a long 
period generator. But it is useful for a SplitMix. It also adds randomness that 
is unpredictable to a caller, but I do not think that is important as the seeds 
are not intended to be cryptographically secure.
{quote}Also, the "seed factory" is a convenience; users that need fast 
construction would probably be better off choosing the RNG that suits them 
better and use that to generate seed themselves.
{quote}
So there are two use cases:
 * Create a RNG using a good seeding routine (the current implementation)
 * Create a RNG for one time throw away usage where the expected number of 
random generator calls is low. This should be seeded fast.

In the later case I would be happy to add a primitive constructor to SplitMix64 
to avoid autoboxing and recommend this:
{code:java}
    return new SplitMix64(SeedFactory.nextLong());
{code}
But this loses the generic ability to switch implementations by changing the 
enum value.

A compromise may be to use the longish period {{XorShift1024Star}} generator 
with the current synchronisation code.

An alternative is to update the {{RandomSource.create}} method to accept a 
{{UniformRandomProvider}} as the seed object. This can be passed through the 
current synchronisation code to ensure thread safety and the user can choose 
the seed provider.

I have also just noted that if you do want to seed using a fast SplitMix64 
without synchronisation then the following is possible:
{code:java}
RandomSource.create(RandomSource.XOR_SHIFT_1024_S, Seedfactory.nextLong());
{code}
So I should add this to the benchmark before starting to change the code.

 

> Create a RandomSource.create benchmark
> --------------------------------------
>
>                 Key: RNG-72
>                 URL: https://issues.apache.org/jira/browse/RNG-72
>             Project: Commons RNG
>          Issue Type: New Feature
>          Components: simple
>    Affects Versions: 1.3
>            Reporter: Alex D Herbert
>            Assignee: Alex D Herbert
>            Priority: Minor
>              Labels: performance-benchmark
>             Fix For: 1.3
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> The recommended method to construct a {{UniformRandomProvider}} is to use, 
> e.g.:
> {code:java}
> import org.apache.commons.rng.UniformRandomProvider;
> import org.apache.commons.rng.simple.RandomSource;
> UniformRandomProvider rng = RandomSource.create(RandomSource.MWC_256);
> {code}
> The factory method knows the type of seed required for the constructor and 
> generates one as appropriate.
> This factory method could be made more efficient, in particular:
>  * Reducing synchronisation around the single source of random seed data
>  * Adding knowledge of the required seed size for arrays
>  * Changing internal data structures, e.g. {{Map<Class<?>, 
> SeedConverter<?,?>>}} can be changed to {{Map<SeedType, SeedConverter<?,?>>}} 
> using an {{EnumMap}} if a new enum {{SeedType}} was created for all the 
> supported seeds (currently 4 types).
>  * Add a new interface to replace {{SeedConverter<?,?>.convert()}} with a 
> {{.convert(int outputArraySize)}} method to allow conversions to generate 
> appropriately sized arrays. The parameter can be ignored for non-array 
> conversions but could optimise array conversions.
> This ticket is to add a JMH benchmark to compare the speed of construction of 
> all the providers using:
>  * Their native constructor
>  * {{RandomSource}} using the native seed of the correct size (calls a 
> constructor using reflection)
>  * {{RandomSource}} using a non native seed (requires seed conversion)
>  * {{RandomSource}} using no seed (requires seed generation)
> The report will be posted here. It could be added to the user guide for 
> reference.
> This work is motivated by the new {{XorShiRo}} generators in version 1.3 that 
> have a native array seed size of 2, 4, or 8. The current {{RandomSource}} 
> create method will generate a fixed seed of length 128 for seeding.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to