[jira] [Commented] (RNG-113) Speed improvement to PCG 32 Xsh-Rs implementation

Alex D Herbert (JIRA) Sat, 17 Aug 2019 09:50:29 -0700


    [ 
https://issues.apache.org/jira/browse/RNG-113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909758#comment-16909758
 ]


Alex D Herbert commented on RNG-113:
------------------------------------

{quote}Isn't it producing more than 32 bits, but less than 64 bits?
{quote}
Maybe. The reference produces 32-bits but I've not read the details on what 
bits are considered random enough for usage.
{quote}Could it be that the 5% ~ 10% improvement is reflecting the difference 
in the number of method calls
{quote}
Maybe. I can test that by inlining the entire routine to produce a long using 
the standard algorithm. I'll do that and post the results for comparison.

One way to preserve encapsulation is to provide another mix method similar to 
the one already in the abstract class with the current default implementation:
{code:java}
public int next() {
    final long x = state;
    state = bump(state);
    return transform(x);
}

protected abstract int transform(long x);

// Extension ...
public long nextLong() {
    final long x = state;
    final long y = bump(x);
    state = bump(y);
    return transform(x, y);
}

protected long transform(long x, long y) {
    // Default implementation
    return NumberFactory.makeLong(transform(x), transform(y));
}
{code}

> Speed improvement to PCG 32 Xsh-Rs implementation
> -------------------------------------------------
>
>                 Key: RNG-113
>                 URL: https://issues.apache.org/jira/browse/RNG-113
>             Project: Commons RNG
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.3
>            Reporter: Alex D Herbert
>            Priority: Minor
>
> Investigate a possible speed increase for the PCG 32 Xsh-Rs generator for the 
> methods nextLong and nextDouble.
> The default implementation for nextLong() and nextDouble() for IntProvider is:
> {code:java}
> public long nextLong() {
>     return NumberFactory.makeLong(nextInt(), nextInt());
> }
> public double nextDouble() {
>     return NumberFactory.makeDouble(nextInt(), nextInt());
> }
> {code}
> Each of these methods composes a long from two int values converted to long.
> The Pcg32 implementation of nextInt() creates the int from a primitive cast 
> conversion of a long. *Thus the current default implementation for nextLong 
> and nextDouble is performing a round trip of a long to an int to a long twice 
> over.* The same logic can be implemented by a special implementation of the 
> methods that work only using long types.
> {code:java}
> public long nextLong() {
>     // Get two values from the LCG
>     final long x = state;
>     final long y = bump(state);
>     state = bump(y);
>     // Perform mix function.
>     // For a 32-bit output the x bits should be shifted down (22 + (int) (x 
> >>> 61)).
>     // Leave in the upper bits by shift up 32 - (22 + (int) (x >>> 61))
>     final long upper = (x ^ (x >>> 22)) << (10 - (int) (x >>> 61));
>     final long lower = (y ^ (y >>> 22)) >>> (22 + (int) (y >>> 61));
>     return (upper & 0xffffffff00000000L) | (lower & 0xffffffffL);
> }
> /** Upper mask for top 26 bits shifted by 27 bits. */
> private static final long MASK1 = ((long) (0xffffffff >>> 6)) << 27;
> /** Lower mask shifted by 5 for 27 bits. */
> private static final long MASK2 = 0xffffffffL >>> 5;
> public double nextDouble() {
>     // Get two values from the LCG
>     final long x = state;
>     final long y = bump(state);
>     state = bump(y);
>     // Perform mix function.
>     // For a 32-bit output the x bits should be shifted down (22 + (int) (x 
> >>> 61)).
>     // To match nextDouble requires 26-bits from int 1 and 27 bits from int 2.
>     // Int 1 is stored in the upper 32-bits as per nextLong() but shifted 
> down 11 and
>     // then masked to keep the upper 26-bits. Discard an extra 5 from int 2.
>     final long upper = (x ^ (x >>> 22)) >>> (1 + (int) (x >>> 61));
>     final long lower = (y ^ (y >>> 22)) >>> (27 + (int) (y >>> 61));
>     return ((upper & MASK1) | (lower & MASK2)) * 0x1.0p-53;
> }
> {code}
> These implementations require that the private {{state}} member and 
> {{bump()}} method of the parent AbstractPcg6432 are made accessible.
> The change should be tested to determine if there is a performance increase 
> with a custom implementation.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (RNG-113) Speed improvement to PCG 32 Xsh-Rs implementation

Reply via email to