[ 
https://issues.apache.org/jira/browse/RNG-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867121#comment-16867121
 ] 

Alex D Herbert commented on RNG-99:
-----------------------------------

The Alias method consists of two steps:
 * Pick a random index into the tables. Uses an {{int}} sample from 
{{nextInt(int)}}.
 * Choose between the original sample value or its alias using a random 
deviate. Uses a {{double}} sample.

Total number of bits using a standard generator is at least 96-bits: a 32-bit 
int and then 64-bits to create a double. Only 53-bits are required to create 
the double. Note: I say at least 96-bits because a rejection algorithm is used 
to create the {{int}} index using {{nextInt(int)}}. So it is possible that more 
than 1 {{int}} sample is used to create the index.

If the size of the table is small (<= 2^11 or 2048) and a power of 2 then the 
two samples can be done using 64-bits: Up to 11-bits to obtain the {{int}} 
sample and the remaining 53-bits for the {{double}}.

So there is an optimisation where the input probability distribution can be 
padded with zeros so that it is a power of 2. In this case any sample with zero 
probability is not possible. However the alias tables will still be computed 
and the method still works.

So if the distribution size is small (<=2048) then only 64-bits are required. 
If larger but still a power of 2 then 96-bits is used. Only when the number of 
possible samples is above 2^30 can padding not be done.

The only disadvantage is storage. The method stores 12-bits per tabulated 
probability. However when padding to a power of 2 the extra probability table 
is not required (it is always zero) and so storage is only the alias. This is 
4-bits per padded probability.

So to keep the options open (for those where storage is more important) a 
factory method could be provided to pad the probabilities:
{code:java}
public class AliasMethodDiscreteSampler implements DiscreteSampler {
    public AliasMethodDiscreteSampler(UniformRandomProvider rng, 
                                      double[] probabilities) {
        // Validate probabilities sum to >0
        // Construct alias tables
        // Detect power of 2 tables
    }
    public AliasMethodDiscreteSampler create(UniformRandomProvider rng, 
                                             double[] probabilities) {
        // Pad probabilities to next power of 2
        // Call constructor
    }
    @Override
    public int sample() {
        // Sample from alias tables
        // (Optimise for power of 2 tables) 
    }
}
{code}
The alternative is to always pad in the constructor if possible, removing the 
choice from the user.

 

> AliasMethodDiscreteSampler
> --------------------------
>
>                 Key: RNG-99
>                 URL: https://issues.apache.org/jira/browse/RNG-99
>             Project: Commons RNG
>          Issue Type: New Feature
>          Components: sampling
>    Affects Versions: 1.3
>            Reporter: Alex D Herbert
>            Assignee: Alex D Herbert
>            Priority: Minor
>             Fix For: 1.3
>
>
> From Wikipedia:
> [Alias Method|https://en.wikipedia.org/wiki/Alias_method]
> {noformat}
> In computing, the alias method is a family of efficient algorithms for 
> sampling from
> a discrete probability distribution, due to A. J. Walker. That is, it returns 
> integer
> values 1 ≤ i ≤ n according to some arbitrary probability distribution. The 
> algorithms
> typically use O(n log n) or O(n) preprocessing time, after which random 
> values can
> be drawn from the distribution in O(1) time.
> {noformat}
> Create a sampler using the Alias method:
> {code:java}
> package org.apache.commons.rng.sampling.distribution;
> public class AliasMethodDiscreteSampler implements DiscreteSampler {
>     public AliasMethodDiscreteSampler(UniformRandomProvider rng, 
>                                       double[] probabilities) {
>         // Validate probabilities sum to >0
>         // Construct alias tables
>     }
>     @Override
>     public int sample() {
>         // Sample from alias tables
>     }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to