Alex D Herbert created RNG-113:
----------------------------------
Summary: Speed improvement to PCG 32 Xsh-Rs implementation
Key: RNG-113
URL: https://issues.apache.org/jira/browse/RNG-113
Project: Commons RNG
Issue Type: Improvement
Components: core
Affects Versions: 1.3
Reporter: Alex D Herbert
Investigate a possible speed increase for the PCG 32 Xsh-Rs generator for the
methods nextLong and nextDouble.
The default implementation for nextLong() and nextDouble() for IntProvider is:
{code:java}
public long nextLong() {
return NumberFactory.makeLong(nextInt(), nextInt());
}
public double nextDouble() {
return NumberFactory.makeDouble(nextInt(), nextInt());
}
{code}
Each of these methods composes a long from two int values converted to long.
The Pcg32 implementation of nextInt() creates the int from a primitive cast
conversion of a long. *Thus the current default implementation for nextLong and
nextDouble is performing a round trip of a long to an int to a long twice
over.* The same logic can be implemented by a special implementation of the
methods that work only using long types.
{code:java}
public long nextLong() {
// Get two values from the LCG
final long x = state;
final long y = bump(state);
state = bump(y);
// Perform mix function.
// For a 32-bit output the x bits should be shifted down (22 + (int) (x >>>
61)).
// Leave in the upper bits by shift up 32 - (22 + (int) (x >>> 61))
final long upper = (x ^ (x >>> 22)) << (10 - (int) (x >>> 61));
final long lower = (y ^ (y >>> 22)) >>> (22 + (int) (y >>> 61));
return (upper & 0xffffffff00000000L) | (lower & 0xffffffffL);
}
/** Upper mask for top 26 bits shifted by 27 bits. */
private static final long MASK1 = ((long) (0xffffffff >>> 6)) << 27;
/** Lower mask shifted by 5 for 27 bits. */
private static final long MASK2 = 0xffffffffL >>> 5;
public double nextDouble() {
// Get two values from the LCG
final long x = state;
final long y = bump(state);
state = bump(y);
// Perform mix function.
// For a 32-bit output the x bits should be shifted down (22 + (int) (x >>>
61)).
// To match nextDouble requires 26-bits from int 1 and 27 bits from int 2.
// Int 1 is stored in the upper 32-bits as per nextLong() but shifted down
11 and
// then masked to keep the upper 26-bits. Discard an extra 5 from int 2.
final long upper = (x ^ (x >>> 22)) >>> (1 + (int) (x >>> 61));
final long lower = (y ^ (y >>> 22)) >>> (27 + (int) (y >>> 61));
return ((upper & MASK1) | (lower & MASK2)) * 0x1.0p-53;
}
{code}
These implementations require that the private {{state}} member and {{bump()}}
method of the parent AbstractPcg6432 are made accessible.
The change should be tested to determine if there is a performance increase
with a custom implementation.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)