[jira] [Commented] (RNG-113) Speed improvement to PCG 32 Xsh-Rs implementation

Gilles (JIRA) Sat, 17 Aug 2019 09:32:24 -0700


    [ 
https://issues.apache.org/jira/browse/RNG-113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909757#comment-16909757
 ]


Gilles commented on RNG-113:
----------------------------

bq. It is an IntProvider because it outputs 32-bits of randomness per cycle.

Isn't it producing more than 32 bits, but less than 64 bits?

bq. require that the private state member and bump() method of the parent 
AbstractPcg6432 are made accessible.

I'm reluctant to break encapsulation.
Enabling this kind of optimization also breaks the relative simplicity of the 
framework's design; IIUC this case would (perhaps) have benefited from an 
interface like
{code}
/**
 * @param n Number of random bits.
 * @return a number where the {@code n} higher bits are random.
 * No promise on the quality of the {@code 64 - n} lower bits.
 */
long next(int n);
{code}
(perhaps) at the cost of a less optimal performance for other algorithms.

Being based on {{IntProvider}} and {{LongProvider}}, the framework requires 
"tweaking" here.
IMHO it's not worth it, even if the performance boost is noticeable.  Could it 
be that the 5% ~ 10% improvement is reflecting the difference in the number of 
method calls (4 for the base implementation vs 1 for your optimized one)?

If a user needs so many {{long}} and {{double}} values that a 10% boost would 
be significant, won't he be better off anyways using one of the 
{{LongProvider}} implementations?


> Speed improvement to PCG 32 Xsh-Rs implementation
> -------------------------------------------------
>
>                 Key: RNG-113
>                 URL: https://issues.apache.org/jira/browse/RNG-113
>             Project: Commons RNG
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.3
>            Reporter: Alex D Herbert
>            Priority: Minor
>
> Investigate a possible speed increase for the PCG 32 Xsh-Rs generator for the 
> methods nextLong and nextDouble.
> The default implementation for nextLong() and nextDouble() for IntProvider is:
> {code:java}
> public long nextLong() {
>     return NumberFactory.makeLong(nextInt(), nextInt());
> }
> public double nextDouble() {
>     return NumberFactory.makeDouble(nextInt(), nextInt());
> }
> {code}
> Each of these methods composes a long from two int values converted to long.
> The Pcg32 implementation of nextInt() creates the int from a primitive cast 
> conversion of a long. *Thus the current default implementation for nextLong 
> and nextDouble is performing a round trip of a long to an int to a long twice 
> over.* The same logic can be implemented by a special implementation of the 
> methods that work only using long types.
> {code:java}
> public long nextLong() {
>     // Get two values from the LCG
>     final long x = state;
>     final long y = bump(state);
>     state = bump(y);
>     // Perform mix function.
>     // For a 32-bit output the x bits should be shifted down (22 + (int) (x 
> >>> 61)).
>     // Leave in the upper bits by shift up 32 - (22 + (int) (x >>> 61))
>     final long upper = (x ^ (x >>> 22)) << (10 - (int) (x >>> 61));
>     final long lower = (y ^ (y >>> 22)) >>> (22 + (int) (y >>> 61));
>     return (upper & 0xffffffff00000000L) | (lower & 0xffffffffL);
> }
> /** Upper mask for top 26 bits shifted by 27 bits. */
> private static final long MASK1 = ((long) (0xffffffff >>> 6)) << 27;
> /** Lower mask shifted by 5 for 27 bits. */
> private static final long MASK2 = 0xffffffffL >>> 5;
> public double nextDouble() {
>     // Get two values from the LCG
>     final long x = state;
>     final long y = bump(state);
>     state = bump(y);
>     // Perform mix function.
>     // For a 32-bit output the x bits should be shifted down (22 + (int) (x 
> >>> 61)).
>     // To match nextDouble requires 26-bits from int 1 and 27 bits from int 2.
>     // Int 1 is stored in the upper 32-bits as per nextLong() but shifted 
> down 11 and
>     // then masked to keep the upper 26-bits. Discard an extra 5 from int 2.
>     final long upper = (x ^ (x >>> 22)) >>> (1 + (int) (x >>> 61));
>     final long lower = (y ^ (y >>> 22)) >>> (27 + (int) (y >>> 61));
>     return ((upper & MASK1) | (lower & MASK2)) * 0x1.0p-53;
> }
> {code}
> These implementations require that the private {{state}} member and 
> {{bump()}} method of the parent AbstractPcg6432 are made accessible.
> The change should be tested to determine if there is a performance increase 
> with a custom implementation.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (RNG-113) Speed improvement to PCG 32 Xsh-Rs implementation

Reply via email to