[
https://issues.apache.org/jira/browse/RNG-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867772#comment-16867772
]
Alex D Herbert commented on RNG-104:
------------------------------------
I have done some more analysis of seed array creation. The generator was either
synchronized using a synchronized block or using an instance of
{{java.util.concurrent.ReentrantLock}} with either a fair or unfair policy. For
each synchronization a block of numbers was computed.
h2. Locks vs synchronized
The use of {{synchronized}} results in variable timing results.
!well_sync_performance.jpg!
Compare this to the use of the {{ReentrantLock}}:
!well_unfair_performance.jpg!
Here the relative time decreases in an expected fashion as the block size
increases and the number of synchronisations reduces.
Other timings made using the unfair version of the {{ReentrantLock}} are all
consistent. Doubling the size of the seed results in doubling of timings. This
is not true for the synchronized block and timings are not consistent. The
inconsistencies are typically that the time is much faster than would be
expected. Thus the JVM is performing some optimisation for the single thread
benchmarks with synchronized blocks to allow the exclusive lock on the object
to be regained efficiently.
Thus we see that on a single thread the {{ReentrantLock}} is slower than
synchronized:
!well_lock_vs_sync1.png!
But for 4 threads the {{ReentrantLock}} with an unfair policy moves ahead of
the synchronized block:
!well_lock_vs_sync4.jpg!
The relative performance is highly variable but the lock is typically faster. A
key point is the lock is comparable to the synchronized block on 1 thread when
the RNG is used in blocks, and better when used on multiple threads.
Note that the {{ReentrantLock}} allows setting a fair policy where it makes
threads wait in line. This is comparable to the unfair policy on a single
thread but slow when there is thread contention:
!well_unfair_vs_fair.jpg!
Only when the block size is large and the system has less synchronization
events to process does the fair policy compare well to the unfair policy. Note
that the unfair policy is performing similar thread selection as the
synchronized block and the two are considered comparable.
h1. Well44497B vs XorShift1024SPhi
Here is the table of compute times for different seeds on a single thread with
no synchronisation:
||Seed|Size||WELL_44497_B||XOR_SHIFT_1024_S_PHI||Relative||
|int[]|2|17.55|6.65|0.38|
|int[]|4|29.25|9.84|0.34|
|int[]|8|52.34|17.56|0.34|
|int[]|16|98.40|31.54|0.32|
|int[]|32|197.01|68.01|0.35|
|int[]|64|380.84|120.60|0.32|
|int[]|128|749.06|250.58|0.33|
|long[]|2|41.12|8.34|0.20|
|long[]|4|78.53|12.10|0.15|
|long[]|8|159.10|21.21|0.13|
|long[]|16|309.06|43.63|0.14|
|long[]|32|610.77|83.59|0.14|
|long[]|64|1,222.49|168.82|0.14|
|long[]|128|2,497.24|406.34|0.16|
So the XorShift1024 generator is much faster. This performance improvement is
similar when done using times to create seeds on 1 or 4 threads so I will not
show the results.
h1. Optimal Block Size
Here are the relative performance for creating a seed on a single thread using
an unfair lock verses creating with no concurrency provision:
!xor_unfair_int1.png!
!xor_unfair_long1.png!
So using a single synchronisation on each call to produce a value results in
most of the time being used to do synchronisation.
If we set the limit that the SeedFactory should spend 50% of the time
synchronising and 50% of the time creating random values then the block size
should be 4 {{long}} and 8 {{int}} to be computed in a single block (i.e. where
the plot relative time is approximately 2).
h1. Seed Quality
This was discussed on the dev ML and this thread. I repeat here.
For a XOR_SHIFT_1024_S_PHI generator the period is 2^1024. If sampled 2^30
times per second (>10 billion/sec), given 31449600 secs/year or approx 2^25, it
will take 2^969 years to repeat the period.
Basically it would be impossible to exhaust the period in a single execution of
the application. This has the advantage that the output of the XorShift1024SPhi
generator natively passes BigCrush. The Well44497B does not unless mixed with
another generator:
{noformat}
RNG Dieharder TestU01 (BigCrush)
WELL_44497_B 0,0,0,0,0 2,3,2,2,2
WELL_44497_B ^ hash code 0,0,0,0,0 2,2,2,4,2
WELL_44497_B ^ ThreadLocalRandom 0,0,0,0,0 0,0,0,1,0
WELL_44497_B ^ SplitMix64 0,0,0,0,0 2,0,0,0,1
{noformat}
Note the test failures for results of the SplitMix64 are on different tests.
Only the plain Well44497B generator or the mix with the hash code
systematically fail tests.
h1. Conclusions
Modifications can be made to the SeedFactory to improve seeding performance:
* Switch to a XorShift1024SPhi generator
* Avoid {{synchronized}} and use a {{ReentrantLock}} for stable scaling
performance (1)
* Compute any int[] in blocks of 8
* Compute any long[] in blocks of 4
* Remove the mix of output with a hashcode
(1) Note that use of a {{ReentrantLock}} with a block size of 4 long or 8 int
would have very similar performance to synchronized for a single thread and is
faster for 4 threads in an extreme test of concurrency.
> SeedFactory seed creation performance analysis
> ----------------------------------------------
>
> Key: RNG-104
> URL: https://issues.apache.org/jira/browse/RNG-104
> Project: Commons RNG
> Issue Type: Task
> Components: simple
> Affects Versions: 1.3
> Reporter: Alex D Herbert
> Assignee: Alex D Herbert
> Priority: Minor
> Attachments: t1.jpg, t4.jpg, well_lock_vs_sync1.png,
> well_lock_vs_sync4.jpg, well_sync_performance.jpg,
> well_unfair_performance.jpg, well_unfair_vs_fair.jpg, xor_unfair_int1.png,
> xor_unfair_long1.png
>
>
> The SeedFactory is used to create seeds for the random generators. To ensure
> thread safety this uses synchronized blocks around a single generator. The
> current method only generates a single int or long per synchronisation.
> Analyze the performance of this approach. The analysis will investigate
> generating multiple values inside each synchronisation around the generator.
> This analysis will also investigate methods to supplement the SeedFactory
> with fast methods to create seeds. This will use a fast seeding method to
> generate a single long value. This can be a seed for a SplitMix generator
> used to create a seed of any length.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)