[ 
https://issues.apache.org/jira/browse/RNG-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18058513#comment-18058513
 ] 

Alex Herbert commented on RNG-190:
----------------------------------

I have quickly put a sampler into the ContinuousSamplersPerformance with 
variations of the methods previously tested.

This time JHM results are for the time per operation, so lower is better. 

Here the baseline is JMH overhead for measuring a double.
h2. JDK 11 on a Xeon(R) CPU E5-1680 v3 @ 3.20GHz
||randomSourceName||samplerType||Method||Score||Error||Median||Error/Score||
|XO_SHI_RO_256_PP|ContinuousUniformSamplerDouble|sample|7.107129778639708|0.01317467768143626|7.10764400474101|0.00185372690407771|
|XO_SHI_RO_256_PP|ContinuousUniformSamplerOpenDouble|sample|11.490390741195693|0.03824148955581027|11.4840742425606|0.00332812786067455|
|XO_SHI_RO_256_PP|UniformOpenDoubleBits|sample|6.2439263628109325|0.024358146028666938|6.24334915860838|0.00390109437768917|
|XO_SHI_RO_256_PP|UniformOpenDoubleMultiply|sample|6.392629980184961|0.24151043388309373|6.36524804892138|0.0377795108791994|
|XO_SHI_RO_256_PP|UniformOpenDoubleRecursion|sample|6.976166397589502|0.03291873208823776|6.97219964355171|0.00471874238831406|
|XO_SHI_RO_256_PP|UniformOpenDoubleRejection|sample|6.687194998155519|0.2910335341756917|6.65384490365712|0.0435210180435841|
| | 
|baseline|2.404546878865408|0.0811845605528997|2.39613172881418|0.0337629352400927|
h2. JDK 21 on a Mac M2 Max 12 Core 3680 MHz
||randomSourceName||samplerType||Method||Score||Error||Median||Error/Score||
|XO_SHI_RO_256_PP|ContinuousUniformSamplerDouble|sample|2.797876325620596|0.07648381737397|2.7966022601012|0.0273363824818115|
|XO_SHI_RO_256_PP|ContinuousUniformSamplerOpenDouble|sample|3.732871824359848|0.17068626753363558|3.70425784263105|0.0457251884245735|
|XO_SHI_RO_256_PP|UniformOpenDoubleBits|sample|2.947098508967197|0.10613794953353475|2.95085955035611|0.0360143881212612|
|XO_SHI_RO_256_PP|UniformOpenDoubleMultiply|sample|2.9200292478398415|0.12973054443946105|2.93497465968083|0.0444278236375311|
|XO_SHI_RO_256_PP|UniformOpenDoubleRecursion|sample|2.803584296645057|0.09601068053365236|2.80392834572843|0.0342456906498387|
|XO_SHI_RO_256_PP|UniformOpenDoubleRejection|sample|3.4402342650400968|0.14376122191528126|3.41780112023039|0.0417882070928114|
| | 
|baseline|0.41161435574112337|0.018818036809662198|0.410728810286373|0.0457176396964576|

Using the current sampler with input bounds of [0, 1) or (0, 1) has a large 
slowdown for the open interval. This uses a rejection method to test against 
the upper and lower bound.

Changing to a branchless version with either bits or multiplication is faster.

The speed of the sampler using rejection or recursion is slightly slower on the 
older CPU. On the M2 CPU the rejection is slower but the recursion method is 
fast and within error of the speed of the branchless versions. So here the 
check for recursion is efficiently ignored by branch prediction.

This makes this the best candidate for a change to the ContinuousUniformSampler 
as either the recursion method to allow sampling from 2^53 - 1 values; or the 
multiplication method to sample from 2^52. Either would be faster than using 
the current non-specialized ContinuousUniformSampler.
 

Note: For reference this benchmark can be done using:
{noformat}
cd commons-rng-examples/examples-jmh
mvn package -Pexamples-jmh
java -jar target/examples-jmh.jar ContinuousSamplersPerformance -rf json -rff 
target/double.json -prandomSourceName=XO_SHI_RO_256_PP 
-psamplerType=ContinuousUniformSamplerDouble,ContinuousUniformSamplerOpenDouble,UniformOpenDoubleBits,UniformOpenDoubleMultiply,UniformOpenDoubleRejection,UniformOpenDoubleRecursion
{noformat}
 

> ContinuousUniformSampler to specialize generation of the open interval (0, 1)
> -----------------------------------------------------------------------------
>
>                 Key: RNG-190
>                 URL: https://issues.apache.org/jira/browse/RNG-190
>             Project: Commons RNG
>          Issue Type: Improvement
>          Components: sampling
>    Affects Versions: 1.6
>            Reporter: Alex Herbert
>            Priority: Minor
>
> The ContinuousUniformSampler can generate double values in the open interval 
> (lo, hi) using a constructor flag:
> {code:java}
> UniformRandomProvider rng = ...
> double lo = ...
> double hi = ...
> boolean excludeBounds = true;
> double x = ContinuousUniformSampler.of(rng, lo, hi, excludeBounds)
>                                    .sample();
> {code}
> The bounds are excluded using a rejection algorithm. This is used as rounding 
> during floating-point operations to map the underlying double value in [0, 1) 
> to (lo, hi) can generate values at the bounds.
> In the case where the interval is (0, 1) then it is possible to generate the 
> random value with a branchless operation that is more performant. Examples 
> are setting the lowest 1-bit using various methods:
> {code:java}
> // Using bits to double
> Double.longBitsToDouble(
>    (source.nextLong() >>> 12) | 0x3ff0000000000001L) - 1.0;
> // Using float addition to force a lowest 1-bit
> ((source.nextLong() >>> 12) + 0.5) * 0x1.0p-52d
> // Adding a bit before conversion to float
> ((source.nextLong() >>> 11) | 1) * 0x1.0p-53d
> {code}
> Note that the output would be limited to 2^52 evenly spaced rationals in (0, 
> 1). Rejection algorithms would output 2^53 - 1 rationals. In practice this 
> difference would not be noticeable.
> Provision of a guarantee of a fast non-zero double would be a benefit to 
> certain applications such as an inverse probability distribution defined on 
> the open interval.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to