[
https://issues.apache.org/jira/browse/RNG-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871828#comment-16871828
]
Alex D Herbert commented on RNG-99:
-----------------------------------
I've created a static factory method to create the sampler.
The factory method takes an {{alpha}} parameter that controls the zero padding
of the input probability table. -1 disables padding. 0 pads to the next power
of 2. Each additional increment pads to the next power of 2. Note that when
zero padding is used fewer bits are required from the RNG per sample as padded
entries in the tables will always be aliased.
There are two versions of the sampler: one requires 96 or 32 bits from the RNG;
the other requires 64 or 32 bits from the RNG per sample. This is possible when
the distribution size is less than 2048 and a power of 2. The following table
shows the number of bits required from the sampler for different alpha when the
table is already a power of 2:
||alpha||Fraction Full||Full Bits||Partial Bits||Av bits||Full Bits||Partial
Bits||Av bits||
|0|1.000|64|32|64|96|32|96|
|1|0.500|64|32|48|96|32|64|
|2|0.250|64|32|40|96|32|48|
|3|0.125|64|32|36|96|32|40|
|4|0.063|64|32|34|96|32|36|
|5|0.031|64|32|33|96|32|34|
|6|0.016|64|32|32.5|96|32|33|
|7|0.008|64|32|32.3|96|32|32.5|
|8|0.004|64|32|32.1|96|32|32.3|
|9|0.002|64|32|32.1|96|32|32.1|
|10|0.001|64|32|32.0|96|32|32.1|
In theory padding can improve performance by 2-fold for the small table size
sampler, and 3-fold for the standard sampler. This will require a lot of
padding. The fraction of executions requiring the full path scales as
2^-alpha^. There is little to be gained by an alpha above 4.
Padding improves performance but because there is more to do in creating the
sampler it has a construction cost and a memory cost.
Here is the sample time for different alpha:
!alias_sample.jpg!
The non-padded sampler (alpha -1) shows a big gain when the size is a power of
2 as the method switches to the optimised small table sampler. This uses 64
bits rather than 96 bits for a sample.
The other alphas show a spike when the table is a power of 2. At the next value
after a power of 2 size the amount of padding doubles. In theory sizes at or
close to below a power of 2 perform worse and just above a power of 2 perform
best. This is true for alpha 1 and 2. This is not the case for alpha 0 where
the performance when padding after a power of 2 reduces performance. This would
require further investigation with more distribution sizes.
The alpha 1 sampler does not always outperform the alpha -1 or 0 sampler. Again
zero padding is not increasing performance as expected.
The alpha 2 sampler is faster but this requires a 4-fold increase in the size
of the table with zero padding.
Here is the construction time for different alpha:
!alias_con.jpg!
The alpha -1 line shows the standard Alias sampler construction cost is O(n)
where n is the table size (this is expected).
The alpha -1 and 0 lines are close. There is not much additional overhead to
pad to the next power of 2. Alpha 1 shows a large additional overhead added due
to the padding. Alpha 2 is not shown as it shows similar steps to alpha 1.
h2. Selecting Default Alpha
Padding to a power of 2 has a large advantage when the table size is small
since an optimised algorithm can be used. Padding with alpha 0 outperforms no
padding and has a small effect on construction time. Thus padding is a sensible
default option.
Increasing the default to an alpha of 1 has variable performance above and
below alpha 0, a noticeable additional construction cost and a memory cost.
Larger alpha (>=2) will have a performance boost but it should be left to the
user to choose this rather than it be mandated as the default.
> AliasMethodDiscreteSampler
> --------------------------
>
> Key: RNG-99
> URL: https://issues.apache.org/jira/browse/RNG-99
> Project: Commons RNG
> Issue Type: New Feature
> Components: sampling
> Affects Versions: 1.3
> Reporter: Alex D Herbert
> Assignee: Alex D Herbert
> Priority: Minor
> Fix For: 1.3
>
> Attachments: alias_con.jpg, alias_sample.jpg
>
>
> From Wikipedia:
> [Alias Method|https://en.wikipedia.org/wiki/Alias_method]
> {noformat}
> In computing, the alias method is a family of efficient algorithms for
> sampling from
> a discrete probability distribution, due to A. J. Walker. That is, it returns
> integer
> values 1 ≤ i ≤ n according to some arbitrary probability distribution. The
> algorithms
> typically use O(n log n) or O(n) preprocessing time, after which random
> values can
> be drawn from the distribution in O(1) time.
> {noformat}
> Create a sampler using the Alias method:
> {code:java}
> package org.apache.commons.rng.sampling.distribution;
> public class AliasMethodDiscreteSampler implements DiscreteSampler {
> public AliasMethodDiscreteSampler(UniformRandomProvider rng,
> double[] probabilities) {
> // Validate probabilities sum to >0
> // Construct alias tables
> }
> @Override
> public int sample() {
> // Sample from alias tables
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)