[
https://issues.apache.org/jira/browse/RNG-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775619#comment-16775619
]
Alex D Herbert commented on RNG-72:
-----------------------------------
Initial benchmark to create 500 RNGs, seeds are pre-built where appropriate.
Test the following construction methods:
* Create using the native constructor with the {{new}} keyword
* Create using {{RandomSource}} with a {{null}} seed
* Create using {{RandomSource}} with the native seed
* Create using {{RandomSource}} with a truncated native seed forcing self
seeding
* Create using {{RandomSource}} with a {{byte[]}} of the appropriate length
The table shows the average time for the {{new}} method and then the relative
time for each of the other methods. Table is sorted by the speed of the native
constructor (time in microseconds):
||RandomSource||new||createNullSeed||createNativeSeed||createSelfSeed||createByteArray||
|SPLIT_MIX_64|3.80|21.92|6.07|-|8.04|
|KISS|8.14|164.17|3.06|3.29|5.22|
|WELL_512_A|9.11|145.66|2.84|5.08|11.93|
|JDK|10.31|8.75|2.75|-|3.78|
|WELL_1024_A|14.61|90.08|2.12|5.52|13.65|
|XOR_SHIFT_1024_S|14.81|163.92|2.15|3.03|7.82|
|WELL_19937_A|157.74|10.41|1.26|8.17|22.87|
|MWC_256|161.11|9.07|1.07|3.60|9.25|
|WELL_44497_A|359.31|5.58|1.05|8.03|21.02|
|ISAAC|1073.02|2.21|1.09|1.48|2.36|
|MT_64|3399.62|1.15|1.03|0.43|0.88|
|MT|5460.06|0.90|1.03|0.68|1.27|
|TWO_CMRES|77129.26|1.02|1.07|-|1.03|
For reference here are the native seed types and length (for arrays):
||RandomSource||Type||Length||
|SPLIT_MIX_64|Long|-|
|KISS|int[]|4|
|WELL_512_A|int[]|16|
|JDK|Long|-|
|WELL_1024_A|int[]|32|
|XOR_SHIFT_1024_S|long[]|16|
|WELL_19937_A|int[]|624|
|MWC_256|int[]|257|
|WELL_44497_A|int[]|1391|
|ISAAC|int[]|256|
|MT_64|long[]|312|
|MT|int[]|624|
|TWO_CMRES|Integer|-|
Notes:
The following generators have a self-seeding routine:
* ISAAC
* MT
* MT_64
* TWO_CMRES
This self-seeding routine is the main overhead during construction. In the case
of the TWO_CMRES the self-seeding is very computationally intensive as it
involves on average 2^16 calls to advance the sub-cycle generators.
Construction of the SPLIT_MIX_64 and JDK just copies the long value as the seed
state.
Construction with a native seed for the other generators is basically an array
copy.
Analysis:
SPLIT_MIX_64 (single number seed)
Construction with a native seed shows the relative cost of instantiation using
reflection is 6x slower. This could be improved by removing the use of
ArrayList within the method that builds the {{Object[]}} to pass to the
constructor. In most cases the arguments are null and an {{Object[]}} of length
1 can be created directly using just the seed.
If the seed must be generated then the construction is 22x slower. This is due
to the use of synchronisation around a provider of seeds.
This could be improved by changing the provider of seeds to be a faster
generator. Currently it is a long period {{Well44497b}}. This could be switched
to a native generator of {{long}} bits such as the XorShift1024Star which is
over 4x as fast.
Creation from a byte array is approximately 1.25x slower than using a {{Long}}
showing the cost of seed conversion is low.
KISS (small array seed)
Construction with a native seed shows the relative cost of instantiation using
reflection is 3x slower. If the seed must be generated then the construction is
164x slower. This is partly due to the factory method creating a seed array of
length 128 when only 16 is used. This could be improved by changing the factory
to know the required size for all {{RandomSource}} options. The seed is
generated from a single instance of a RNG requiring synchronisation on each
call. This can be changed to use a single {{long}} seed from the synchronised
seed factory to create a SplitMix to generate the array of seed values without
synchronisation. This will reduce the possible randomness of the seeds due to
using a shorter period RNG.
The self-seeding time is almost as fast as using a native seed. A large part of
construction from the native seed can be attributed to the construction using
reflection.
When using a {{byte[]}} the speed is 5x slower, so conversion of the bytes is
an overhead. Inspection of the code shows the use of array allocation within a
loop to copy the bytes for each number conversion. This can be refactored to
use a method to read the bytes from a position in a given byte array and
convert them directly to a {{long}} avoiding array allocation overhead. This
method is used in {{java.nio.ByteBuffer}} for conversions.
WELL_44497_A (large array seed)
Construction with a native seed shows the relative cost of instantiation using
reflection is 1.05x slower. The majority of the construction time is not the
lookup of the constructor call but the work inside the constructor to copy the
input seed array.
Construction with a {{byte[]}} is 21x slower. This demonstrates that the
conversion of bytes to numbers is a large overhead. Notes on improvement are as
for the KISS generator (better byte conversion without array allocation
overhead).
If the seed must be generated then the construction is 6x slower. This could be
improved by using a faster generator of seed data.
Generating seed data
Currently the seed data is generated using a single long period {{Well44497b}}.
This is used with synchronization to create entire array seeds. An alternative
would be to use a single generator to produce {{long}} seeds that are then used
to seed a SplitMix generator. This can be used without synchronization to
generate the seed data. This will improve the construction performance at the
cost of reducing the randomness of the seed data produced by the factory
methods.
So the question is whether the factory method should try to maximise randomness
of the seed data (as current) or minimise construction overhead.
Summary:
Non-destructive improvements:
* Improve construction of {{Object[]}} that is passed to the constructor via
reflection
* Extend the internal knowledge of the factory method to know the size of the
required seed arrays
* Improve the byte array conversion methods
Possibly breaking improvements [1]:
* Change the long period {{Well44497b}} for a less long {{XorShift1024Star}}.
Note that this still has a period of 2^1024. If billions of seeds are required
per second (1,000,000,000 ~ 2^30) then at (60*60*24*365=31536000~2^25
seconds/year) it would be 2^969 years before a repeat. This generator is over
4x faster at {{long}} generation and 2.5x faster at {{int}} generation.
* Change the array seed generation to use a SplitMix generator. This reduces
synchronisation to just a single call to get the seed for the SplitMix
generator. This limits the seeds to 2^64 possible values. Using the Xor method
of the output with a random hash value (this is currently employed for all
seeds output by the {{SeedFactory}} will increase this by 2^32 to 2^96 seed
values.
[1] These are only "breaking" as they reduce the randomness of the providers
that can be created by the factory.
> Create a RandomSource.create benchmark
> --------------------------------------
>
> Key: RNG-72
> URL: https://issues.apache.org/jira/browse/RNG-72
> Project: Commons RNG
> Issue Type: New Feature
> Components: simple
> Affects Versions: 1.3
> Reporter: Alex D Herbert
> Priority: Minor
> Labels: performance-benchmark
>
> The recommended method to construct a {{UniformRandomProvider}} is to use,
> e.g.:
> {code:java}
> import org.apache.commons.rng.UniformRandomProvider;
> import org.apache.commons.rng.simple.RandomSource;
> UniformRandomProvider rng = RandomSource.create(RandomSource.MWC_256);
> {code}
> The factory method knows the type of seed required for the constructor and
> generates one as appropriate.
> This factory method could be made more efficient, in particular:
> * Reducing synchronisation around the single source of random seed data
> * Adding knowledge of the required seed size for arrays
> * Changing internal data structures, e.g. {{Map<Class<?>,
> SeedConverter<?,?>>}} can be changed to {{Map<SeedType, SeedConverter<?,?>>}}
> using an {{EnumMap}} if a new enum {{SeedType}} was created for all the
> supported seeds (currently 4 types).
> * Add a new interface to replace {{SeedConverter<?,?>.convert()}} with a
> {{.convert(int outputArraySize)}} method to allow conversions to generate
> appropriately sized arrays. The parameter can be ignored for non-array
> conversions but could optimise array conversions.
> This ticket is to add a JMH benchmark to compare the speed of construction of
> all the providers using:
> * Their native constructor
> * {{RandomSource}} using the native seed of the correct size (calls a
> constructor using reflection)
> * {{RandomSource}} using a non native seed (requires seed conversion)
> * {{RandomSource}} using no seed (requires seed generation)
> The report will be posted here. It could be added to the user guide for
> reference.
> This work is motivated by the new {{XorShiRo}} generators in version 1.3 that
> have a native array seed size of 2, 4, or 8. The current {{RandomSource}}
> create method will generate a fixed seed of length 128 for seeding.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)