[
https://issues.apache.org/jira/browse/MAHOUT-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015244#comment-13015244
]
Ted Dunning commented on MAHOUT-550:
------------------------------------
{quote}
gaussian01() needs to add 0.5 to its result before returning. What's the
reasoning for dividing by 6 out of curiosity?
{quote}
This is a dark-ages method for generating normally distributed numbers. The
idea is that if you add 12 independent uniforms, the law of large numbers
begins to kick in and you get something kind of normally distributed with mean
12 * 0.5 = 6. The reason that 12 numbers are added is that the variance of a
unit uniform distribution is 1/12 so if you add 12 of them, you get variance =
1.
This was clever back in the sixties and seventies when multiplication was
expensive. It is no longer clever.
Unfortunately, this implementation is wrong in several respects.
a) it divides by 12
b) it doesn't subtract 6
Here is an R demo of the corrected algorithm
> x = matrix(ncol=12, runif(1200000))
> g = rowSums(x)-6
> mean(g)
[1] -0.00282425
> sd(g)
[1] 1.002757
> qqnorm(g,cex=0.1,col='red')
> abline(0,1)
The resulting graph is pretty good but not as good as the real thing.
> Add RandomVector and RandomMatrix
> ---------------------------------
>
> Key: MAHOUT-550
> URL: https://issues.apache.org/jira/browse/MAHOUT-550
> Project: Mahout
> Issue Type: New Feature
> Components: Math
> Reporter: Lance Norskog
> Fix For: 0.5
>
> Attachments: MAHOUT-550.patch, RandomMatrix.patch
>
>
> Add Vector and Matrix implementations that generate a unique and reproducible
> random number for each index.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira