The distribution of the dot product of two randomly chosen, uniformly
distributed unit vectors is roughly normally distributed with a standard
deviation that declines with increasing dimension roughly with your observed
sqrt scaling factor.

In fact, it is just this scaling property that makes the stochastic SVD work
with high probability of high accuracy.  The general property that random
unit vectors are nearly orthogonal is called "quasi-orthogonality"

On Wed, Oct 19, 2011 at 4:32 PM, Sean Owen <[email protected]> wrote:

> Right, that's not quite the issue. It's that some comparisons are made
> in 2-space, some in 10-space, etc. It would be nice to have some idea
> that a distance is 2-space is "about as meaningfully far" as some
> other distance in 10-space. I'm trying to find the order of that
> correcting factor and it seems to be sqrt(n). Within 2- or 10-space
> indeed those distances aren't randomly distributed... but would they
> be so differently distributed as to change this factor? Gut says no,
> but I have no more justification than that.
>
> On Wed, Oct 19, 2011 at 10:15 PM, Ted Dunning <[email protected]>
> wrote:
> > None of this actually applies because real data are not uniformly
> > distributed (not even close).  Do the sampling on your own data and pick
> a
> > good guess from that.
> >
> > On Wed, Oct 19, 2011 at 11:40 AM, Sean Owen <[email protected]> wrote:
> >
> >> Ah, I'm looking for the distance between points within, rather than
> >> on, the hypercube. (Think of it as random rating vectors, in the range
> >> 0..1, across all movies. They're not binary ratings but ratings from 0
> >> to 1.)
> >>
> >> On Wed, Oct 19, 2011 at 6:30 PM, Justin Cranshaw <[email protected]>
> >> wrote:
> >> > I think the analytic answer should be sqrt(n/2).
> >> >
> >> > So let's suppose X and Y are random points in the n dimensional
> hypercube
> >> {0,1}^n.  Let Z_i be an indicator variable that is 1 if X_i != Y_i and 0
> >> otherwise.  Then d(X,Y)^2 =sum (X_i - Y_i)^2 = sum( Z_i).  Then the
> expected
> >> squared distance is E d(X,Y)^2 = sum( E Z_i) = sum( Pr[ X_i != Y_i]) =
> n/2.
> >> >
> >> >
> >>
> >
>

Reply via email to