[
https://issues.apache.org/jira/browse/RNG-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509047#comment-17509047
]
Alex Herbert commented on RNG-171:
----------------------------------
{quote}they trade code clarity for a modest memory gain
{quote}
I do not think the current methods are any more or less clear. The cached
boolean implementation in the codebase still requires an understanding of
masking and bit shifts.
The new boolean method uses half the memory and is faster. IMO this is a valid
gain.
With regard to the int cache method the current method stores 32-bits for reuse
in a 64-bit long. This is wasteful. It could implement the same method using an
int and the boolean flag with basically the same code. I've not benchmarked
that and cannot recall if I ever thought of it even though it is obvious now.
Essentially all the needs to be cached for the next round is 33-bits. 32-bits
for the value and the flag to say if the value is present.
However as I understand it using a boolean flag means that offsets for all
fields in child classes will not naturally be memory aligned. It would be up to
the JVM to decide to align to 4/8 byte blocks for the next field or not. This
would be all the state fields for a RNG based on the abstract class. Ideally
these should be memory aligned for fastest access. If the JVM does align them
then you gain no memory advantage using a boolean over the implementation using
effectively 32-bits for a single bit flag.
If the method is changed to use a long to store the 32-bits and a flag then the
fields will remain aligned. This would require a benchmark to verify if
performance is affected for any child classes.
Here are some more results on two platforms with JDK 8, 11, 17 for the int
generation.
MacBook
|CachedGenerationPerformance| |8|11|17|
|cacheMethod|randomSourceName|Score|Score|Score|
|-1|SPLIT_MIX_64|3858.64|7185.44|4148.66|
|0|SPLIT_MIX_64|3122.25|3482.81|3908.80|
|4|SPLIT_MIX_64|3289.90|3411.38|4235.49|
|5|SPLIT_MIX_64|3418.00|3444.90|4087.53|
|6|SPLIT_MIX_64|3483.51|3531.09|4250.82|
|7|SPLIT_MIX_64|3332.12|3339.13|4063.81|
|8|SPLIT_MIX_64|3402.04|3526.50|4069.37|
|-1|XOR_SHIFT_1024_S|4622.16|4623.75|4787.81|
|0|XOR_SHIFT_1024_S|3592.54|3481.52|3932.36|
|4|XOR_SHIFT_1024_S|3925.31|4509.47|4068.30|
|5|XOR_SHIFT_1024_S|3879.26|3951.73|4226.37|
|6|XOR_SHIFT_1024_S|3872.55|3873.99|4252.48|
|7|XOR_SHIFT_1024_S|3802.10|3776.63|3920.08|
|8|XOR_SHIFT_1024_S|3888.22|4001.32|4217.29|
Workstation desktop
|CachedGenerationPerformance|8|11|17|
|cacheMethod|randomSourceName|Score|Score|Score|
|-1|SPLIT_MIX_64|4020.14|4012.83|4016.14|
|0|SPLIT_MIX_64|3478.30|3481.57|3491.51|
|4|SPLIT_MIX_64|3333.82|3360.07|3419.41|
|5|SPLIT_MIX_64|3355.61|3358.00|3386.43|
|6|SPLIT_MIX_64|3390.77|3484.99|3484.83|
|7|SPLIT_MIX_64|3159.29|3277.54|3247.06|
|8|SPLIT_MIX_64|3343.22|3356.03|3395.30|
|-1|XOR_SHIFT_1024_S|4742.75|4753.28|5044.07|
|0|XOR_SHIFT_1024_S|4341.41|4550.42|4413.60|
|4|XOR_SHIFT_1024_S|3846.62|3860.63|3931.35|
|5|XOR_SHIFT_1024_S|3866.00|3901.33|3869.51|
|6|XOR_SHIFT_1024_S|3869.54|3915.91|3930.25|
|7|XOR_SHIFT_1024_S|3779.50|3814.34|3868.77|
|8|XOR_SHIFT_1024_S|3861.67|3888.29|3865.72|
Note:
I upgraded JMH from 1.16 to 1.34. I have not observed the strange warm-up cost
for method 5 in this round of tests since updating. It may be due to JMH, or
may have just been an issue benchmarking on my laptop originally (other
processes ongoing).
On the workstation the method 0 is without the current caching implementation.
It is fast for SPLIT_MIX but not for XOR_SHIFT. There is no 1.2-SNAPSHOT in the
local maven repo so I do not know how the class was included. I extracted the
LongProvider class from the shaded jar file and decompiled it. It does not have
a caching implementation. Doing the same on the MacBook confirms that platform
does have the caching implementation. So the results from the workstation show
that caching is a not performance gain for the very fast generator. It is
purely for using the entire 64-bit long output from the RNG for the 32-bit int
output.
The most consistently fast method is 7, although differences are marginal. This
is the cleanest code IMO. It has the fewest operations on the generated long
value. But it would switch the output to little endian (low, high) rather than
(high, low). Otherwise all methods appear to be the same speed. So the simplest
to understand without breaking functional compatibility would be to mask the
value stored to the low 32-bits (method 5). The upper bits act as a state flag,
either they are empty or they are not.
I would prefer to make the change to the code, especially for the boolean cache
which is a performance gain.
> Reduce memory footprint of cached int and boolean source
> --------------------------------------------------------
>
> Key: RNG-171
> URL: https://issues.apache.org/jira/browse/RNG-171
> Project: Commons RNG
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.4
> Reporter: Alex Herbert
> Priority: Major
> Fix For: 1.5
>
>
> The base implementation for the IntProvider and LongProvider caches values
> from the generation method to be used as the source for boolean bits, and in
> the case of the LongProvider, for int bits. The nextBoolean and nextInt
> methods thus use all the bits output by the generator. The memory footprint
> of this cache can be reduced.
> h3. Boolean
> IntProvider
> Currently a 32-bit value and a 32-bit mask is stored.
> However when the cache is refilled a single bit is used for the boolean
> return value. The cache need only store the unused 31-bits. This leaves room
> to use a bit to indicate when to reset the cache. A similar change can be
> made for the LongProvider by only storing 63-bit unused bits.
> In this example for a 64-bit cache the booleanSource is initialised as
> MIN_VALUE (a single bit in the most significant position).
> {code:java}
> @Override
> public boolean nextBoolean() {
> long l = booleanSource;
> if (l == Long.MIN_VALUE) {
> // Refill
> l = next();
> // Store unused 63 bits and a refill flag, return highest bit
> booleanSource = (l << 1) | 1;
> return l < 0;
> }
> // Shift up eventually resetting, return current high bit
> booleanSource = l << 1;
> return l < 0;
> }
> {code}
> h3. Int
> LongProvider only
> Stores a 64-bit value and a boolean flag. However only 32-bits of the stored
> value are used. Thus the remaining 32-bits can be used as a flag. In this
> example the sign bit is used as it can be easily checked with a < operator.
> {code:java}
> @Override
> public int nextInt() {
> long l = intSource;
> if (l < 0) {
> // Refill
> l =next();
> // Store low 32 bits, return high 32 bits
> intSource = l & 0xffff_ffffL;
> // OR
> // Store low 63 bits, return high 32 bits
> //intSource = (l << 1) >>> 1;
> return (int) (l >>> 32);
> }
> // Reset and return previous low bits
> intSource = -1;
> return (int) l;
> }{code}
> The example above maintains compatibility to return the upper then lower
> parts of the long for the 2 int values. An alternative with fewer operations
> is:
> {code:java}
> @Override
> public int nextInt() {
> long l = intSource;
> if (l < 0) {
> // Refill
> l = next();
> // FUNCTIONALLY BREAKING CHANGE
> // Store high 32 bits, return low 32 bits
> intSource = (l >>> 32);
> return (int) l;
> }
> // Reset and return previous low bits
> intSource = -1;
> return (int) l;
> }
> {code}
> This is arguably simpler but would be a functionally breaking change.
> Memory footprint would change:
> {noformat}
> IntProvider
> 8 bytes (int,int) -> 4 bytes (int)
> LongProvider
> 25 bytes (long,long,long,boolean) -> 16 bytes (long,long)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)