[
https://issues.apache.org/jira/browse/MAHOUT-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934245#action_12934245
]
Ted Dunning commented on MAHOUT-550:
------------------------------------
{quote}
My use case calculates distances between N-dimensional "points" of random
numbers. The space is normal distribution from 0->1. nextGaussian() generates
points outward to MAX_DOUBLE, so I have to clip it. gaussian01 generates a
normal distribution from 0 to 1.
{quote}
Actually, the distance between two multi-variate normally distributed points is
not limited to [0,1]. It is unbounded. You can derive the distribution from
the fact that the dot product of two such random vectors is very close to
normally distributed.
The distance between two points on a high dimensional sphere is also not
distributed on [0, 1]. Instead, it ranges from 0 to 2. The dot product of
uniformly distributed points on a sphere is also very nearly normal for
reasonably high dimension. This can be used to compute the distribution of
distances which has a max at about sqrt(1/2).
Even if you take points on the sphere limited to the positive orthant, you
don't get the distribution you suggest.
In any case generating matrix entries based on the GAUSS01 distribution doesn't
make much sense. In practice, uniform distribution is also not very useful.
Better to just generate the unit normal case.
> Add RandomVector and RandomMatrix
> ---------------------------------
>
> Key: MAHOUT-550
> URL: https://issues.apache.org/jira/browse/MAHOUT-550
> Project: Mahout
> Issue Type: New Feature
> Components: Math
> Reporter: Lance Norskog
> Fix For: 0.5
>
> Attachments: RandomMatrix.patch
>
>
> Add Vector and Matrix implementations that generate a unique and reproducible
> random number for each index.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.