[ 
https://issues.apache.org/jira/browse/RNG-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18058502#comment-18058502
 ] 

Alex Herbert commented on RNG-190:
----------------------------------

I added the proposed methods to the existing FloatingPointGenerationBenchmark 
in the RNG JMH testing project. The nextDouble methods already present match 
the 3 variants proposed without the addition of a single trailing 1-bit. I also 
added two methods that will reject the value of zero either using a while loop 
or recursively calling the method until a non-zero is generated. The advantage 
of recursion is that an infinite loop will not occur due to stack memory 
overflow if the source RNG is broken (always outputs 0).

Some JHM results on different machines. These show throughput in operations per 
microsecond (ops/us), so higher is better.
h2. JDK 11 on a Xeon(R) CPU E5-1680 v3 @ 3.20GHz 
||Method||Score||Error||Median||Error/Score||
|nextDoubleBaseline|297.8624173026762|4.4869018049355125|299.346131111462|0.0150636721663885|
|nextDoubleUsingBitsToDouble|238.18163935145117|0.2133806325748035|238.235417182178|0.000895873557490918|
|nextDoubleUsingMultiply52bits|237.63373274394112|3.0896222172960104|238.320300512036|0.0130016146345064|
|nextDoubleUsingMultiply53bits|238.2101027165856|0.20576849621398088|238.190495601139|0.000863810954562231|
|nextOpenDoubleUsingBitsToDouble|238.09650775254653|0.31388283169984155|238.169262848583|0.00131830086321997|
|nextOpenDoubleUsingMultiply52bits|224.57408745081293|1.895589083334299|224.996612290642|0.00844081837246462|
|nextOpenDoubleUsingMultiply53bits|226.47119428994102|0.29576932420275054|226.567469439521|0.00130599092361429|
|nextOpenDoubleUsingRecursion|206.84551217502607|1.995744812412826|207.169454928554|0.00964848012135787|
|nextOpenDoubleUsingRejection|226.56479097191817|0.28916887781429623|226.623869631695|0.00127631869265219|

Various methods for [0, 1) are similar.

Generation of the open interval (0, 1) is the same speed using the bits method 
but slower using multiplication. This is due to the extra operation to set the 
lowest bit before multiplying.

The rejection method is as fast as the multiplication methods. But the 
recursion method is slower.
h2. JDK 21 on a Mac M2 Max 12 Core 3680 MHz
||Method||Score||Error||Median||Error/Score||
|nextDoubleBaseline|1151.0610513016331|12.014569608233082|1149.88023278787|0.0104378213428791|
|nextDoubleUsingBitsToDouble|1034.669196216507|10.16198164245682|1035.16226184598|0.00982147886456494|
|nextDoubleUsingMultiply52bits|1060.0012874161462|12.598283653861003|1059.86851562353|0.0118851588233166|
|nextDoubleUsingMultiply53bits|1057.3437124509855|15.302336011097825|1062.65305932769|0.0144724329760529|
|nextOpenDoubleUsingBitsToDouble|1195.7397306329574|15.091250470253652|1195.07122943135|0.0126208489051921|
|nextOpenDoubleUsingMultiply52bits|973.7029340405512|52.22340941276182|982.230348634046|0.0536338215558739|
|nextOpenDoubleUsingMultiply53bits|969.0140894788441|10.345413151627627|968.540188752747|0.0106762257266988|
|nextOpenDoubleUsingRecursion|1129.6154061341995|15.03396167435129|1126.69719152318|0.0133089205341143|
|nextOpenDoubleUsingRejection|1275.656925300677|15.079541068971725|1277.02553371118|0.011821000435064|

A more modern processor. Throughput is 4x higher.

Using bits to double is slower for the [0, 1) interval than multiplication.

Generation of the open interval (0, 1) by multiplication is again slightly 
slower. The use of bits in this case is extremely fast. I cannot explain this 
as the two methods are different in a single bit in the double representation:
{code:java}
Double.longBitsToDouble((source.nextLong() >>> 12) | 0x3ff0000000000000L) - 1.0;
Double.longBitsToDouble((source.nextLong() >>> 12) | 0x3ff0000000000001L) - 1.0;
{code}
Using rejection is extremely fast. This seems odd as it is faster than the 
baseline which is using the generator to output long values, and has the same 
code as generating a double by multiplication but with a branch to check for 
zero.

Since a rejection by recursion algorithm is currently used in 
ContinuousUniformSampler, and these results are inconclusive to whether 
changing to a branchless algorithm would benefit the sampler I will change the 
benchmark. I can adapt the benchmark currently used to test samplers in 
ContinuousSamplersPerformance.

> ContinuousUniformSampler to specialize generation of the open interval (0, 1)
> -----------------------------------------------------------------------------
>
>                 Key: RNG-190
>                 URL: https://issues.apache.org/jira/browse/RNG-190
>             Project: Commons RNG
>          Issue Type: Improvement
>          Components: sampling
>    Affects Versions: 1.6
>            Reporter: Alex Herbert
>            Priority: Minor
>
> The ContinuousUniformSampler can generate double values in the open interval 
> (lo, hi) using a constructor flag:
> {code:java}
> UniformRandomProvider rng = ...
> double lo = ...
> double hi = ...
> boolean excludeBounds = true;
> double x = ContinuousUniformSampler.of(rng, lo, hi, excludeBounds)
>                                    .sample();
> {code}
> The bounds are excluded using a rejection algorithm. This is used as rounding 
> during floating-point operations to map the underlying double value in [0, 1) 
> to (lo, hi) can generate values at the bounds.
> In the case where the interval is (0, 1) then it is possible to generate the 
> random value with a branchless operation that is more performant. Examples 
> are setting the lowest 1-bit using various methods:
> {code:java}
> // Using bits to double
> Double.longBitsToDouble(
>    (source.nextLong() >>> 12) | 0x3ff0000000000001L) - 1.0;
> // Using float addition to force a lowest 1-bit
> ((source.nextLong() >>> 12) + 0.5) * 0x1.0p-52d
> // Adding a bit before conversion to float
> ((source.nextLong() >>> 11) | 1) * 0x1.0p-53d
> {code}
> Note that the output would be limited to 2^52 evenly spaced rationals in (0, 
> 1). Rejection algorithms would output 2^53 - 1 rationals. In practice this 
> difference would not be noticeable.
> Provision of a guarantee of a fast non-zero double would be a benefit to 
> certain applications such as an inverse probability distribution defined on 
> the open interval.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to