[jira] [Commented] (RNG-72) Create a RandomSource.create benchmark

Alex D Herbert (JIRA) Fri, 22 Feb 2019 13:21:17 -0800


    [ 
https://issues.apache.org/jira/browse/RNG-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775619#comment-16775619
 ]


Alex D Herbert commented on RNG-72:
-----------------------------------

Initial benchmark to create 500 RNGs, seeds are pre-built where appropriate. 
Test the following construction methods:
 * Create using the native constructor with the {{new}} keyword
 * Create using {{RandomSource}} with a {{null}} seed
 * Create using {{RandomSource}} with the native seed
 * Create using {{RandomSource}} with a truncated native seed forcing self 
seeding
 * Create using {{RandomSource}} with a {{byte[]}} of the appropriate length

The table shows the average time for the {{new}} method and then the relative 
time for each of the other methods. Table is sorted by the speed of the native 
constructor (time in microseconds):
||RandomSource||new||createNullSeed||createNativeSeed||createSelfSeed||createByteArray||
|SPLIT_MIX_64|3.80|21.92|6.07|-|8.04|
|KISS|8.14|164.17|3.06|3.29|5.22|
|WELL_512_A|9.11|145.66|2.84|5.08|11.93|
|JDK|10.31|8.75|2.75|-|3.78|
|WELL_1024_A|14.61|90.08|2.12|5.52|13.65|
|XOR_SHIFT_1024_S|14.81|163.92|2.15|3.03|7.82|
|WELL_19937_A|157.74|10.41|1.26|8.17|22.87|
|MWC_256|161.11|9.07|1.07|3.60|9.25|
|WELL_44497_A|359.31|5.58|1.05|8.03|21.02|
|ISAAC|1073.02|2.21|1.09|1.48|2.36|
|MT_64|3399.62|1.15|1.03|0.43|0.88|
|MT|5460.06|0.90|1.03|0.68|1.27|
|TWO_CMRES|77129.26|1.02|1.07|-|1.03|

For reference here are the native seed types and length (for arrays):
||RandomSource||Type||Length||
|SPLIT_MIX_64|Long|-|
|KISS|int[]|4|
|WELL_512_A|int[]|16|
|JDK|Long|-|
|WELL_1024_A|int[]|32|
|XOR_SHIFT_1024_S|long[]|16|
|WELL_19937_A|int[]|624|
|MWC_256|int[]|257|
|WELL_44497_A|int[]|1391|
|ISAAC|int[]|256|
|MT_64|long[]|312|
|MT|int[]|624|
|TWO_CMRES|Integer|-|

Notes:

The following generators have a self-seeding routine:
 * ISAAC
 * MT
 * MT_64
 * TWO_CMRES

This self-seeding routine is the main overhead during construction. In the case 
of the TWO_CMRES the self-seeding is very computationally intensive as it 
involves on average 2^16 calls to advance the sub-cycle generators.

Construction of the SPLIT_MIX_64 and JDK just copies the long value as the seed 
state.

Construction with a native seed for the other generators is basically an array 
copy.

 

Analysis:

SPLIT_MIX_64 (single number seed)

Construction with a native seed shows the relative cost of instantiation using 
reflection is 6x slower. This could be improved by removing the use of 
ArrayList within the method that builds the {{Object[]}} to pass to the 
constructor. In most cases the arguments are null and an {{Object[]}} of length 
1 can be created directly using just the seed.

If the seed must be generated then the construction is 22x slower. This is due 
to the use of synchronisation around a provider of seeds. 

This could be improved by changing the provider of seeds to be a faster 
generator. Currently it is a long period {{Well44497b}}. This could be switched 
to a native generator of {{long}} bits such as the XorShift1024Star which is 
over 4x as fast. 

Creation from a byte array is approximately 1.25x slower than using a {{Long}} 
showing the cost of seed conversion is low.

 

KISS (small array seed)

Construction with a native seed shows the relative cost of instantiation using 
reflection is 3x slower. If the seed must be generated then the construction is 
164x slower. This is partly due to the factory method creating a seed array of 
length 128 when only 16 is used. This could be improved by changing the factory 
to know the required size for all {{RandomSource}} options. The seed is 
generated from a single instance of a RNG requiring synchronisation on each 
call. This can be changed to use a single {{long}} seed from the synchronised 
seed factory to create a SplitMix to generate the array of seed values without 
synchronisation. This will reduce the possible randomness of the seeds due to 
using a shorter period RNG.

The self-seeding time is almost as fast as using a native seed. A large part of 
construction from the native seed can be attributed to the construction using 
reflection.

When using a {{byte[]}} the speed is 5x slower, so conversion of the bytes is 
an overhead. Inspection of the code shows the use of array allocation within a 
loop to copy the bytes for each number conversion. This can be refactored to 
use a method to read the bytes from a position in a given byte array and 
convert them directly to a {{long}} avoiding array allocation overhead. This 
method is used in {{java.nio.ByteBuffer}} for conversions.

 

WELL_44497_A (large array seed)

Construction with a native seed shows the relative cost of instantiation using 
reflection is 1.05x slower. The majority of the construction time is not the 
lookup of the constructor call but the work inside the constructor to copy the 
input seed array.

Construction with a {{byte[]}} is 21x slower. This demonstrates that the 
conversion of bytes to numbers is a large overhead. Notes on improvement are as 
for the KISS generator (better byte conversion without array allocation 
overhead).

If the seed must be generated then the construction is 6x slower. This could be 
improved by using a faster generator of seed data.

 

Generating seed data

Currently the seed data is generated using a single long period {{Well44497b}}. 
This is used with synchronization to create entire array seeds. An alternative 
would be to use a single generator to produce {{long}} seeds that are then used 
to seed a SplitMix generator. This can be used without synchronization to 
generate the seed data. This will improve the construction performance at the 
cost of reducing the randomness of the seed data produced by the factory 
methods.

So the question is whether the factory method should try to maximise randomness 
of the seed data (as current) or minimise construction overhead.

 

Summary:

Non-destructive improvements:
 * Improve construction of {{Object[]}} that is passed to the constructor via 
reflection
 * Extend the internal knowledge of the factory method to know the size of the 
required seed arrays
 * Improve the byte array conversion methods

Possibly breaking improvements [1]:
 * Change the long period {{Well44497b}} for a less long {{XorShift1024Star}}. 
Note that this still has a period of 2^1024. If billions of seeds are required 
per second (1,000,000,000 ~ 2^30) then at (60*60*24*365=31536000~2^25 
seconds/year) it would be 2^969 years before a repeat. This generator is over 
4x faster at {{long}} generation and 2.5x faster at {{int}} generation.
 * Change the array seed generation to use a SplitMix generator. This reduces 
synchronisation to just a single call to get the seed for the SplitMix 
generator. This limits the seeds to 2^64 possible values. Using the Xor method 
of the output with a random hash value (this is currently employed for all 
seeds output by the {{SeedFactory}} will increase this by 2^32 to 2^96 seed 
values.

[1] These are only "breaking" as they reduce the randomness of the providers 
that can be created by the factory.

> Create a RandomSource.create benchmark
> --------------------------------------
>
>                 Key: RNG-72
>                 URL: https://issues.apache.org/jira/browse/RNG-72
>             Project: Commons RNG
>          Issue Type: New Feature
>          Components: simple
>    Affects Versions: 1.3
>            Reporter: Alex D Herbert
>            Priority: Minor
>              Labels: performance-benchmark
>
> The recommended method to construct a {{UniformRandomProvider}} is to use, 
> e.g.:
> {code:java}
> import org.apache.commons.rng.UniformRandomProvider;
> import org.apache.commons.rng.simple.RandomSource;
> UniformRandomProvider rng = RandomSource.create(RandomSource.MWC_256);
> {code}
> The factory method knows the type of seed required for the constructor and 
> generates one as appropriate.
> This factory method could be made more efficient, in particular:
>  * Reducing synchronisation around the single source of random seed data
>  * Adding knowledge of the required seed size for arrays
>  * Changing internal data structures, e.g. {{Map<Class<?>, 
> SeedConverter<?,?>>}} can be changed to {{Map<SeedType, SeedConverter<?,?>>}} 
> using an {{EnumMap}} if a new enum {{SeedType}} was created for all the 
> supported seeds (currently 4 types).
>  * Add a new interface to replace {{SeedConverter<?,?>.convert()}} with a 
> {{.convert(int outputArraySize)}} method to allow conversions to generate 
> appropriately sized arrays. The parameter can be ignored for non-array 
> conversions but could optimise array conversions.
> This ticket is to add a JMH benchmark to compare the speed of construction of 
> all the providers using:
>  * Their native constructor
>  * {{RandomSource}} using the native seed of the correct size (calls a 
> constructor using reflection)
>  * {{RandomSource}} using a non native seed (requires seed conversion)
>  * {{RandomSource}} using no seed (requires seed generation)
> The report will be posted here. It could be added to the user guide for 
> reference.
> This work is motivated by the new {{XorShiRo}} generators in version 1.3 that 
> have a native array seed size of 2, 4, or 8. The current {{RandomSource}} 
> create method will generate a fixed seed of length 128 for seeding.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (RNG-72) Create a RandomSource.create benchmark

Reply via email to