[
https://issues.apache.org/jira/browse/MATH-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andras Sereny updated MATH-1152:
--------------------------------
Description:
org.apache.commons.math3.distribution.EnumeratedDistribution.sample() performs
a *linear* search to find the appropriate element in the probability space
(called singletons here) given a random double value. For large probability
spaces, this is not effective. Instead, we should cache the cumulative
probabilities and do a *binary* search.
Rough implementation:
{code:title=EnumeratedDistribution.java|borderStyle=solid}
void computeCumulative() {
cumulative = new double[size];
double sum = 0;
for (int i = 1; i < weights.length - 1; i++) {
cumulative[i] = cumulative[i-1] + weights[i-1];
}
cumulative[size - 1] = 1;
}
{code}
and then
{code:title=EnumeratedDistribution.java|borderStyle=solid}
int sampleIndex() {
double randomValue = random.nextDouble();
int result = Arrays.binarySearch(cumulative, randomValue);
if (result >= 0) return result;
int insertionPoint = -result-1;
return insertionPoint;
}
{code}
was:
org.apache.commons.math3.distribution.EnumeratedDistribution.sample() performs
a *linear* search to find the appropriate element in the probability space
(called singletons here) given a random double value. For large probability
spaces, this is not effective. Instead, we should cache the cumulative
probabilities and do a *binary* search.
Rough implementation:
{code:title=Bar.java|borderStyle=solid}
void computeCumulative() {
cumulative = new double[size];
double sum = 0;
for (int i = 1; i < weights.length - 1; i++) {
cumulative[i] = cumulative[i-1] + weights[i-1];
}
cumulative[size - 1] = 1;
}
{code}
and then
{code:title=Bar.java|borderStyle=solid}
int sampleIndex() {
double randomValue = random.nextDouble();
int result = Arrays.binarySearch(cumulative, randomValue);
if (result >= 0) return result;
int insertionPoint = -result-1;
return insertionPoint;
}
{code}
> Suboptimal implementation of EnumeratedDistribution.sample()
> ------------------------------------------------------------
>
> Key: MATH-1152
> URL: https://issues.apache.org/jira/browse/MATH-1152
> Project: Commons Math
> Issue Type: Bug
> Affects Versions: 3.3
> Reporter: Andras Sereny
>
> org.apache.commons.math3.distribution.EnumeratedDistribution.sample()
> performs a *linear* search to find the appropriate element in the probability
> space (called singletons here) given a random double value. For large
> probability spaces, this is not effective. Instead, we should cache the
> cumulative probabilities and do a *binary* search.
> Rough implementation:
> {code:title=EnumeratedDistribution.java|borderStyle=solid}
> void computeCumulative() {
> cumulative = new double[size];
> double sum = 0;
> for (int i = 1; i < weights.length - 1; i++) {
> cumulative[i] = cumulative[i-1] + weights[i-1];
> }
> cumulative[size - 1] = 1;
> }
> {code}
> and then
> {code:title=EnumeratedDistribution.java|borderStyle=solid}
> int sampleIndex() {
> double randomValue = random.nextDouble();
> int result = Arrays.binarySearch(cumulative, randomValue);
> if (result >= 0) return result;
> int insertionPoint = -result-1;
> return insertionPoint;
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)