Does this all translate to doing high-dimensional distance with random projection? Project each vector to one dimension and subtract? This sounds like a really useful distance measure.
On Wed, Oct 19, 2011 at 7:32 PM, Ted Dunning <[email protected]> wrote: > The distribution of the dot product of two randomly chosen, uniformly > distributed unit vectors is roughly normally distributed with a standard > deviation that declines with increasing dimension roughly with your > observed > sqrt scaling factor. > > In fact, it is just this scaling property that makes the stochastic SVD > work > with high probability of high accuracy. The general property that random > unit vectors are nearly orthogonal is called "quasi-orthogonality" > > On Wed, Oct 19, 2011 at 4:32 PM, Sean Owen <[email protected]> wrote: > > > Right, that's not quite the issue. It's that some comparisons are made > > in 2-space, some in 10-space, etc. It would be nice to have some idea > > that a distance is 2-space is "about as meaningfully far" as some > > other distance in 10-space. I'm trying to find the order of that > > correcting factor and it seems to be sqrt(n). Within 2- or 10-space > > indeed those distances aren't randomly distributed... but would they > > be so differently distributed as to change this factor? Gut says no, > > but I have no more justification than that. > > > > On Wed, Oct 19, 2011 at 10:15 PM, Ted Dunning <[email protected]> > > wrote: > > > None of this actually applies because real data are not uniformly > > > distributed (not even close). Do the sampling on your own data and > pick > > a > > > good guess from that. > > > > > > On Wed, Oct 19, 2011 at 11:40 AM, Sean Owen <[email protected]> wrote: > > > > > >> Ah, I'm looking for the distance between points within, rather than > > >> on, the hypercube. (Think of it as random rating vectors, in the range > > >> 0..1, across all movies. They're not binary ratings but ratings from 0 > > >> to 1.) > > >> > > >> On Wed, Oct 19, 2011 at 6:30 PM, Justin Cranshaw <[email protected]> > > >> wrote: > > >> > I think the analytic answer should be sqrt(n/2). > > >> > > > >> > So let's suppose X and Y are random points in the n dimensional > > hypercube > > >> {0,1}^n. Let Z_i be an indicator variable that is 1 if X_i != Y_i and > 0 > > >> otherwise. Then d(X,Y)^2 =sum (X_i - Y_i)^2 = sum( Z_i). Then the > > expected > > >> squared distance is E d(X,Y)^2 = sum( E Z_i) = sum( Pr[ X_i != Y_i]) = > > n/2. > > >> > > > >> > > > >> > > > > > > -- Lance Norskog [email protected]
