[ 
https://issues.apache.org/jira/browse/RNG-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508696#comment-17508696
 ] 

Alex Herbert commented on RNG-171:
----------------------------------

I have added these methods to the previous benchmark in RNG-57. This only a 
development branch in my Commons RNG fork.

The benchmark created a class that wrapped a RNG and provided the caching. It 
is based on a 1.2-SNAPSHOT. However I did not locally reinstall the core 
library. Thus the snapshot for core is from a version with the caching 
included. This allows comparison to the current implementation in the library.

For convenience I did not test all providers in the library. The trends should 
be the same (as was the case when the benchmark was last performed).
h2. Int cache
h3. Methods
||Method||Description||
|-1|No caching|
|0|The current caching method implemented inside the LongProvider|
|4|The current caching method implemented inside the wrapper class|
|5|Store the low bits using a mask: l & 0xffff_ffffL|
|6|Store the low bits using a shift: (1 << 1) >>> 1|
|7|Store the high bits using a shift: l >>> 32|
h3. Results

5 warm-up, 10 iterations
||cacheMethod||randomSourceName||Score||Error||Median||
|-1|SPLIT_MIX_64|3869.82|63.28|3854.37|
|0|SPLIT_MIX_64|3126.82|9.23|3126.11|
|4|SPLIT_MIX_64|3511.19|214.87|3465.83|
|5|SPLIT_MIX_64|8914.18|352.82|8927.83|
|6|SPLIT_MIX_64|3594.36|45.13|3583.23|
|7|SPLIT_MIX_64|3315.81|47.53|3303.05|
|-1|XOR_SHIFT_1024_S|4623.54|11.80|4622.43|
|0|XOR_SHIFT_1024_S|3790.71|159.87|3766.33|
|4|XOR_SHIFT_1024_S|4020.19|161.09|4045.94|
|5|XOR_SHIFT_1024_S|3854.82|17.26|3851.16|
|6|XOR_SHIFT_1024_S|3860.96|12.58|3857.78|
|7|XOR_SHIFT_1024_S|3802.66|32.81|3797.79|
h3. Conclusions
 * No caching is slowest.
 * Caching with direct implementation inside the LongProvider (method 0) is 
faster than using the wrapper (method 4).
 * The method using a mask (method 5) can sometimes have a large warm-up cost. 
This is repeatable across JMH runs. Here it is observed for the SPLIT_MIX_64 
result but not the XOR_SHIFT_1024_S result. It only differs from method 6 in 
the line used to store the lower 32-bits. This would require more investigation 
on different JVMs to see if it is consistent.
 * Methods 6 and 7 have a similar performance (note method 7 is a functionally 
breaking change). Method 5 is just as fast when the strange warm-up cost is not 
present.
 * Method 4 has a higher error than the new methods 6 and 7; but overall the 
methods all appear closely matched in time.

These results suggest moving to method 6. It will save 1 byte of memory in the 
implementation, have the same performance and return the same results (high 
part, low part).
h2. Boolean cache
h3. Methods
||Method||Description||
|-1|No caching|
|0|The current caching method implemented inside the Int/LongProvider|
|1|The current caching method implemented inside the wrapper class|
|5|Store only the 31/63-unused bits|
h3. Results

5 warm-up, 10 iterations
||Source||cacheMethod||randomSourceName||Score||Error||Median||
|int|-1|KISS|6328.01|14.53|6327.08|
| |0|KISS|3078.48|11.08|3078.47|
| |1|KISS|3250.82|11.17|3248.86|
| |5|KISS|2909.25|12.15|2907.00|
| |-1|MWC_256|4595.00|59.74|4577.38|
| |0|MWC_256|3096.44|37.11|3087.86|
| |1|MWC_256|3391.18|210.83|3332.62|
| |5|MWC_256|2902.35|82.19|2885.26|
|long|-1|SPLIT_MIX_64|4675.35|13.47|4675.17|
| |0|SPLIT_MIX_64|3610.08|8.23|3609.51|
| |1|SPLIT_MIX_64|3217.19|15.17|3214.41|
| |5|SPLIT_MIX_64|2960.63|21.30|2960.81|
| |-1|XOR_SHIFT_1024_S|5039.77|83.48|5020.97|
| |0|XOR_SHIFT_1024_S|3144.73|360.55|3078.19|
| |1|XOR_SHIFT_1024_S|2981.02|27.84|2972.97|
| |5|XOR_SHIFT_1024_S|2975.77|22.23|2970.34|
h3. Conclusions
 * No caching is slowest.
 * Caching with direct implementation inside the *IntProvider* (method 0) is 
*faster* than using the wrapper (method 1).
 * Caching with direct implementation inside the *LongProvider* (method 0) is 
*slower* than using the wrapper (method 1)
 * Caching with method 5 is faster (or comparable) to method 1. The 
XOR_SHIFT_1024_S is the only comparable result. The others are all faster.

These results suggest moving to method 5. It will half the memory usage in the 
implementation, have the same performance or better, and return the same 
results (using big endian bits from the underlying int/long source).
h2. Overall

The nextInt method can reduce memory footprint by 1 byte without changing 
performance. The nextBoolean method can half memory footprint with the same or 
better performance.

I will repeat these tests on a few different JVMs and platforms to confirm. 

 

> Reduce memory footprint of cached int and boolean source
> --------------------------------------------------------
>
>                 Key: RNG-171
>                 URL: https://issues.apache.org/jira/browse/RNG-171
>             Project: Commons RNG
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.4
>            Reporter: Alex Herbert
>            Priority: Major
>             Fix For: 1.5
>
>
> The base implementation for the IntProvider and LongProvider caches values 
> from the generation method to be used as the source for boolean bits, and in 
> the case of the LongProvider, for int bits. The nextBoolean and nextInt 
> methods thus use all the bits output by the generator. The memory footprint 
> of this cache can be reduced.
> h3. Boolean
> IntProvider
> Currently a 32-bit value and a 32-bit mask is stored.
> However when the cache is refilled a single bit is used for the boolean 
> return value. The cache need only store the unused 31-bits. This leaves room 
> to use a bit to indicate when to reset the cache. A similar change can be 
> made for the LongProvider by only storing 63-bit unused bits.
> In this example for a 64-bit cache the booleanSource is initialised as 
> MIN_VALUE (a single bit in the most significant position).
> {code:java}
> @Override
> public boolean nextBoolean() {
>     long l = booleanSource;
>     if (l == Long.MIN_VALUE) {
>         // Refill
>         l = next();
>         // Store unused 63 bits and a refill flag, return highest bit
>         booleanSource = (l << 1) | 1;
>         return l < 0;
>     }
>     // Shift up eventually resetting, return current high bit
>     booleanSource = l << 1;
>     return l < 0;
> }
> {code}
> h3. Int
> LongProvider only
> Stores a 64-bit value and a boolean flag. However only 32-bits of the stored 
> value are used. Thus the remaining 32-bits can be used as a flag. In this 
> example the sign bit is used as it can be easily checked with a < operator.
> {code:java}
> @Override
> public int nextInt() {
>     long l = intSource;
>     if (l < 0) {
>         // Refill
>         l =next();
>         // Store low 32 bits, return high 32 bits
>         intSource = l & 0xffff_ffffL;
>         // OR
>         // Store low 63 bits, return high 32 bits
>         //intSource = (l << 1) >>> 1;
>         return (int) (l >>> 32);
>     }
>     // Reset and return previous low bits
>     intSource = -1;
>     return (int) l;
> }{code}
> The example above maintains compatibility to return the upper then lower 
> parts of the long for the 2 int values. An alternative with fewer operations 
> is:
> {code:java}
> @Override
> public int nextInt() {
>     long l = intSource;
>     if (l < 0) {
>         // Refill
>         l = next();
>         // FUNCTIONALLY BREAKING CHANGE
>         // Store high 32 bits, return low 32 bits
>         intSource = (l >>> 32);
>         return (int) l;
>     }
>     // Reset and return previous low bits
>     intSource = -1;
>     return (int) l;
> }
> {code}
> This is arguably simpler but would be a functionally breaking change.
> Memory footprint would change:
> {noformat}
> IntProvider
> 8 bytes (int,int)                 -> 4 bytes (int)
> LongProvider
> 25 bytes (long,long,long,boolean) -> 16 bytes (long,long)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to