[ 
https://issues.apache.org/jira/browse/RNG-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568839#comment-16568839
 ] 

Alex D Herbert commented on RNG-50:
-----------------------------------

{quote}I do not follow: Do you mean that the rest of the previous comment 
should be ignored?
{quote}
I wrote that at the top because in the original post I said the unsync and sync 
version were the same speed. So if you read the original it had changed.

Here is a more extensive benchmark:
||Source||Range||Name||Relative Score||
|KISS|4|NoCache|1|
|KISS|4|Cache|0.637002514107754|
|KISS|4|SyncCache|0.606805802031795|
|KISS|16|NoCache|1|
|KISS|16|Cache|0.663954180665926|
|KISS|16|SyncCache|0.631319224692749|
|KISS|64|NoCache|1|
|KISS|64|Cache|0.636384669613506|
|KISS|64|SyncCache|0.609715929448392|
|KISS|256|NoCache|1|
|KISS|256|Cache|0.648985318438977|
|KISS|256|SyncCache|0.626782442178428|
|KISS|1024|NoCache|1|
|KISS|1024|Cache|0.659225625944472|
|KISS|1024|SyncCache|0.641191069564397|
|KISS|4096|NoCache|1|
|KISS|4096|Cache|0.683340577614337|
|KISS|4096|SyncCache|0.663516180493439|
|SPLIT_MIX_64|4|NoCache|1|
|SPLIT_MIX_64|4|Cache|0.628425684341808|
|SPLIT_MIX_64|4|SyncCache|0.593939891208052|
|SPLIT_MIX_64|16|NoCache|1|
|SPLIT_MIX_64|16|Cache|0.644051788271398|
|SPLIT_MIX_64|16|SyncCache|0.611315259381778|
|SPLIT_MIX_64|64|NoCache|1|
|SPLIT_MIX_64|64|Cache|0.640776657597466|
|SPLIT_MIX_64|64|SyncCache|0.614925898338969|
|SPLIT_MIX_64|256|NoCache|1|
|SPLIT_MIX_64|256|Cache|0.647514729530435|
|SPLIT_MIX_64|256|SyncCache|0.623509940647398|
|SPLIT_MIX_64|1024|NoCache|1|
|SPLIT_MIX_64|1024|Cache|0.627242797856745|
|SPLIT_MIX_64|1024|SyncCache|0.613652464877662|
|SPLIT_MIX_64|4096|NoCache|1|
|SPLIT_MIX_64|4096|SyncCache|0.668044213459518|
|SPLIT_MIX_64|4096|Cache|0.64825539875968|
|WELL_1024_A|4|NoCache|1|
|WELL_1024_A|4|Cache|0.674680507463973|
|WELL_1024_A|4|SyncCache|0.64990985639085|
|WELL_1024_A|16|NoCache|1|
|WELL_1024_A|16|Cache|0.674488306896242|
|WELL_1024_A|16|SyncCache|0.647697233808698|
|WELL_1024_A|64|NoCache|1|
|WELL_1024_A|64|Cache|0.67452109541707|
|WELL_1024_A|64|SyncCache|0.673070926611252|
|WELL_1024_A|256|NoCache|1|
|WELL_1024_A|256|SyncCache|0.699218305156117|
|WELL_1024_A|256|Cache|0.693893305588347|
|WELL_1024_A|1024|NoCache|1|
|WELL_1024_A|1024|SyncCache|0.662001634421658|
|WELL_1024_A|1024|Cache|0.650959461660946|
|WELL_1024_A|4096|NoCache|1|
|WELL_1024_A|4096|SyncCache|0.724667718309707|
|WELL_1024_A|4096|Cache|0.713244863157027|
|WELL_44497_B|4|NoCache|1|
|WELL_44497_B|4|SyncCache|0.693214912971548|
|WELL_44497_B|4|Cache|0.676796240713726|
|WELL_44497_B|16|NoCache|1|
|WELL_44497_B|16|SyncCache|0.691494549936262|
|WELL_44497_B|16|Cache|0.682040619982067|
|WELL_44497_B|64|NoCache|1|
|WELL_44497_B|64|SyncCache|0.707794461549788|
|WELL_44497_B|64|Cache|0.696961073242649|
|WELL_44497_B|256|NoCache|1|
|WELL_44497_B|256|SyncCache|0.700939276921851|
|WELL_44497_B|256|Cache|0.694691018842925|
|WELL_44497_B|1024|NoCache|1|
|WELL_44497_B|1024|SyncCache|0.711826139604225|
|WELL_44497_B|1024|Cache|0.675221230709037|
|WELL_44497_B|4096|NoCache|1|
|WELL_44497_B|4096|SyncCache|0.778102849989416|
|WELL_44497_B|4096|Cache|0.74339413761269|

This time done with 20 repeats on a Mac laptop with a core i7 chip. Last 
timings were on a Linux Xeon setup. It's definitely faster but less so on this 
machine.

Q. So, worth adding to the library?

 

 

> PoissonSampler single use speed improvements
> --------------------------------------------
>
>                 Key: RNG-50
>                 URL: https://issues.apache.org/jira/browse/RNG-50
>             Project: Commons RNG
>          Issue Type: Improvement
>    Affects Versions: 1.0
>            Reporter: Alex D Herbert
>            Priority: Minor
>         Attachments: PoissonSamplerTest.java, jmh-result.csv
>
>
> The Sampler architecture of {{org.apache.commons.rng.sampling.distribution}} 
> is nicely written for fast sampling of small dataset sizes. The constructors 
> for the samplers do not check the input parameters are valid for the 
> respective distributions (in contrast to the old 
> {{org.apache.commons.math3.random.distribution}} classes). I assume this is a 
> design choice for speed. Thus most of the samplers can be used within a loop 
> to sample just one value with very little overhead.
> The {{PoissonSampler}} precomputes log factorial numbers upon construction if 
> the mean is above 40. This is done using the {{InternalUtils.FactorialLog}} 
> class. As of version 1.0 this internal class is currently only used in the 
> {{PoissonSampler}}.
> The cache size is limited to 2*PIVOT (where PIVOT=40). But it creates and 
> precomputes the cache every time a PoissonSampler is constructed if the mean 
> is above the PIVOT value.
> Why not create this once in a static block for the PoissonSampler?
> {code:java}
> /** {@code log(n!)}. */
> private static final FactorialLog factorialLog;
>      
> static 
> {
>     factorialLog = FactorialLog.create().withCache((int) (2 * 
> PoissonSampler.PIVOT));
> }
> {code}
> This will make the construction cost of a new {{PoissonSampler}} negligible. 
> If the table is computed dynamically as a static construction method then the 
> overhead will be in the first use. Thus the following call will be much 
> faster:
> {code:java}
> UniformRandomProvider rng = ...;
> int value = new PoissonSampler(rng, 50).sample();
> {code}
> I have tested this modification (see attached file) and the results are:
> {noformat}
> Mean 40  Single construction ( 7330792) vs Loop construction                  
>         (24334724)   (3.319522.2x faster)
> Mean 40  Single construction ( 7330792) vs Loop construction with static 
> FactorialLog ( 7990656)   (1.090013.2x faster)
> Mean 50  Single construction ( 6390303) vs Loop construction                  
>         (19389026)   (3.034132.2x faster)
> Mean 50  Single construction ( 6390303) vs Loop construction with static 
> FactorialLog ( 6146556)   (0.961857.2x faster)
> Mean 60  Single construction ( 6041165) vs Loop construction                  
>         (21337678)   (3.532047.2x faster)
> Mean 60  Single construction ( 6041165) vs Loop construction with static 
> FactorialLog ( 5329129)   (0.882136.2x faster)
> Mean 70  Single construction ( 6064003) vs Loop construction                  
>         (23963516)   (3.951765.2x faster)
> Mean 70  Single construction ( 6064003) vs Loop construction with static 
> FactorialLog ( 5306081)   (0.875013.2x faster)
> Mean 80  Single construction ( 6064772) vs Loop construction                  
>         (26381365)   (4.349935.2x faster)
> Mean 80  Single construction ( 6064772) vs Loop construction with static 
> FactorialLog ( 6341274)   (1.045591.2x faster)
> {noformat}
> Thus the speed improvements would be approximately 3-4 fold for single use 
> Poisson sampling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to