[ 
https://issues.apache.org/jira/browse/RNG-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871828#comment-16871828
 ] 

Alex D Herbert commented on RNG-99:
-----------------------------------

I've created a static factory method to create the sampler. 

The factory method takes an {{alpha}} parameter that controls the zero padding 
of the input probability table. -1 disables padding. 0 pads to the next power 
of 2. Each additional increment pads to the next power of 2. Note that when 
zero padding is used fewer bits are required from the RNG per sample as padded 
entries in the tables will always be aliased.

There are two versions of the sampler: one requires 96 or 32 bits from the RNG; 
the other requires 64 or 32 bits from the RNG per sample. This is possible when 
the distribution size is less than 2048 and a power of 2. The following table 
shows the number of bits required from the sampler for different alpha when the 
table is already a power of 2:
||alpha||Fraction Full||Full Bits||Partial Bits||Av bits||Full Bits||Partial 
Bits||Av bits||
|0|1.000|64|32|64|96|32|96|
|1|0.500|64|32|48|96|32|64|
|2|0.250|64|32|40|96|32|48|
|3|0.125|64|32|36|96|32|40|
|4|0.063|64|32|34|96|32|36|
|5|0.031|64|32|33|96|32|34|
|6|0.016|64|32|32.5|96|32|33|
|7|0.008|64|32|32.3|96|32|32.5|
|8|0.004|64|32|32.1|96|32|32.3|
|9|0.002|64|32|32.1|96|32|32.1|
|10|0.001|64|32|32.0|96|32|32.1|

In theory padding can improve performance by 2-fold for the small table size 
sampler, and 3-fold for the standard sampler. This will require a lot of 
padding. The fraction of executions requiring the full path scales as 
2^-alpha^. There is little to be gained by an alpha above 4.

Padding improves performance but because there is more to do in creating the 
sampler it has a construction cost and a memory cost.

Here is the sample time for different alpha:

!alias_sample.jpg!

The non-padded sampler (alpha -1) shows a big gain when the size is a power of 
2 as the method switches to the optimised small table sampler. This uses 64 
bits rather than 96 bits for a sample.

The other alphas show a spike when the table is a power of 2. At the next value 
after a power of 2 size the amount of padding doubles. In theory sizes at or 
close to below a power of 2 perform worse and just above a power of 2 perform 
best. This is true for alpha 1 and 2. This is not the case for alpha 0 where 
the performance when padding after a power of 2 reduces performance. This would 
require further investigation with more distribution sizes.

The alpha 1 sampler does not always outperform the alpha -1 or 0 sampler. Again 
zero padding is not increasing performance as expected.

The alpha 2 sampler is faster but this requires a 4-fold increase in the size 
of the table with zero padding.

Here is the construction time for different alpha:
  
 !alias_con.jpg!

The alpha -1 line shows the standard Alias sampler construction cost is O(n) 
where n is the table size (this is expected).

The alpha -1 and 0 lines are close. There is not much additional overhead to 
pad to the next power of 2. Alpha 1 shows a large additional overhead added due 
to the padding. Alpha 2 is not shown as it shows similar steps to alpha 1.
h2. Selecting Default Alpha

Padding to a power of 2 has a large advantage when the table size is small 
since an optimised algorithm can be used. Padding with alpha 0 outperforms no 
padding and has a small effect on construction time. Thus padding is a sensible 
default option.

Increasing the default to an alpha of 1 has variable performance above and 
below alpha 0, a noticeable additional construction cost and a memory cost. 
Larger alpha (>=2) will have a performance boost but it should be left to the 
user to choose this rather than it be mandated as the default.

 

> AliasMethodDiscreteSampler
> --------------------------
>
>                 Key: RNG-99
>                 URL: https://issues.apache.org/jira/browse/RNG-99
>             Project: Commons RNG
>          Issue Type: New Feature
>          Components: sampling
>    Affects Versions: 1.3
>            Reporter: Alex D Herbert
>            Assignee: Alex D Herbert
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: alias_con.jpg, alias_sample.jpg
>
>
> From Wikipedia:
> [Alias Method|https://en.wikipedia.org/wiki/Alias_method]
> {noformat}
> In computing, the alias method is a family of efficient algorithms for 
> sampling from
> a discrete probability distribution, due to A. J. Walker. That is, it returns 
> integer
> values 1 ≤ i ≤ n according to some arbitrary probability distribution. The 
> algorithms
> typically use O(n log n) or O(n) preprocessing time, after which random 
> values can
> be drawn from the distribution in O(1) time.
> {noformat}
> Create a sampler using the Alias method:
> {code:java}
> package org.apache.commons.rng.sampling.distribution;
> public class AliasMethodDiscreteSampler implements DiscreteSampler {
>     public AliasMethodDiscreteSampler(UniformRandomProvider rng, 
>                                       double[] probabilities) {
>         // Validate probabilities sum to >0
>         // Construct alias tables
>     }
>     @Override
>     public int sample() {
>         // Sample from alias tables
>     }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to