It is a valid distance, but it will not necessarily accurately reflect the original distance. You need to preserve roughly log N dimensions to get something reasonably accurate relative to the original distance (with high probability).
To the extent that your original vectors are colinear with your 1xN projection (which is just a vector, after all), your distances will be preserved. This is the root of why a higher dimensional projection works ... there is very high probability that there is enough colinearity with something in the lower dimensional sub-space that your distance is OK. This is particularly true if you aren't really sampling your vectors uniformly in the higher dimensional space as when you are projecting through some matrix A (as with SVD). On Sat, Oct 22, 2011 at 8:23 PM, Lance Norskog <[email protected]> wrote: > No, I have two n-d vectors and want to measure their "distance". Suppose I > project both via the same 1xN random projection matrix, and then find the > delta of the two values. Is this a valid distance? An approximate Manhattan > distance? > > On Sat, Oct 22, 2011 at 1:46 AM, Ted Dunning <[email protected]> > wrote: > > > I don't understand the question. > > > > What are the two vectors. Suppose you have x, an N dimensional vector > and > > \Omega, a random 1xN projection. \Omega x is a 1-dimensional vector. > > > > Where is the second 1-d vector? > > > > On Fri, Oct 21, 2011 at 8:23 PM, Lance Norskog <[email protected]> > wrote: > > > > > More completely: given an Nx1 random projection matrix, project any > > > N-dimensional vector to a 1-dimensional vector. The delta of two of > these > > > 1-d vectors gives a consistent difference, no? > > > > > > On Fri, Oct 21, 2011 at 9:35 AM, Ted Dunning <[email protected]> > > > wrote: > > > > > > > Well, it is missing a matrix to define the projections, but it is the > > > heart > > > > of the issue. > > > > > > > > It also lacks a proof which is kind of important among some circles. > > The > > > > clever proof is actually not that hard to grok. > > > > > > > > On Fri, Oct 21, 2011 at 5:07 AM, Federico Castanedo < > > > > [email protected] > > > > > wrote: > > > > > > > > > I think, that's a good explanation of the Johnson-Lindenstrauss > > Lemma, > > > > > which > > > > > is the basis of the manifold learning theory using random > > projections. > > > > > > > > > > 2011/10/21 Ted Dunning <[email protected]> > > > > > > > > > > > Sort of. > > > > > > > > > > > > I may be misunderstanding the question. > > > > > > > > > > > > If you take a random orthogonal projection, then distances will > be > > > > > > preserved > > > > > > within a reasonably small epsilon to reasonably high probability. > > > > > > > > > > > > Mathematically, if you take a random matrix \Omega which is tall > > and > > > > > skinny > > > > > > and do a QR decomposition: > > > > > > > > > > > > QR = \Omega > > > > > > > > > > > > Then Q is tall and skinny and Q^T projects vectors into a much > > lower > > > > > > dimensional space. > > > > > > > > > > > > If you take vectors x and y, then > > > > > > > > > > > > |x - y| \approx |Q' x - Q'y| > > > > > > > > > > > > or, more precisely, we have, with high probability, > > > > > > > > > > > > |x - y| - \epsilon \le |Q' x - Q'y| \le |x - y| + \epsilon > > > > > > > > > > > > This is very close to what you were saying, I think. > > > > > > > > > > > > On Thu, Oct 20, 2011 at 4:07 PM, Lance Norskog < > [email protected]> > > > > > wrote: > > > > > > > > > > > > > Does this all translate to doing high-dimensional distance with > > > > random > > > > > > > projection? Project each vector to one dimension and subtract? > > This > > > > > > sounds > > > > > > > like a really useful distance measure. > > > > > > > > > > > > > > On Wed, Oct 19, 2011 at 7:32 PM, Ted Dunning < > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > The distribution of the dot product of two randomly chosen, > > > > uniformly > > > > > > > > distributed unit vectors is roughly normally distributed with > a > > > > > > standard > > > > > > > > deviation that declines with increasing dimension roughly > with > > > your > > > > > > > > observed > > > > > > > > sqrt scaling factor. > > > > > > > > > > > > > > > > In fact, it is just this scaling property that makes the > > > stochastic > > > > > SVD > > > > > > > > work > > > > > > > > with high probability of high accuracy. The general property > > > that > > > > > > random > > > > > > > > unit vectors are nearly orthogonal is called > > > "quasi-orthogonality" > > > > > > > > > > > > > > > > On Wed, Oct 19, 2011 at 4:32 PM, Sean Owen <[email protected] > > > > > > wrote: > > > > > > > > > > > > > > > > > Right, that's not quite the issue. It's that some > comparisons > > > are > > > > > > made > > > > > > > > > in 2-space, some in 10-space, etc. It would be nice to have > > > some > > > > > idea > > > > > > > > > that a distance is 2-space is "about as meaningfully far" > as > > > some > > > > > > > > > other distance in 10-space. I'm trying to find the order of > > > that > > > > > > > > > correcting factor and it seems to be sqrt(n). Within 2- or > > > > 10-space > > > > > > > > > indeed those distances aren't randomly distributed... but > > would > > > > > they > > > > > > > > > be so differently distributed as to change this factor? Gut > > > says > > > > > no, > > > > > > > > > but I have no more justification than that. > > > > > > > > > > > > > > > > > > On Wed, Oct 19, 2011 at 10:15 PM, Ted Dunning < > > > > > [email protected] > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > None of this actually applies because real data are not > > > > uniformly > > > > > > > > > > distributed (not even close). Do the sampling on your > own > > > data > > > > > and > > > > > > > > pick > > > > > > > > > a > > > > > > > > > > good guess from that. > > > > > > > > > > > > > > > > > > > > On Wed, Oct 19, 2011 at 11:40 AM, Sean Owen < > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > >> Ah, I'm looking for the distance between points within, > > > rather > > > > > > than > > > > > > > > > >> on, the hypercube. (Think of it as random rating > vectors, > > in > > > > the > > > > > > > range > > > > > > > > > >> 0..1, across all movies. They're not binary ratings but > > > > ratings > > > > > > from > > > > > > > 0 > > > > > > > > > >> to 1.) > > > > > > > > > >> > > > > > > > > > >> On Wed, Oct 19, 2011 at 6:30 PM, Justin Cranshaw < > > > > > > > [email protected]> > > > > > > > > > >> wrote: > > > > > > > > > >> > I think the analytic answer should be sqrt(n/2). > > > > > > > > > >> > > > > > > > > > > >> > So let's suppose X and Y are random points in the n > > > > > dimensional > > > > > > > > > hypercube > > > > > > > > > >> {0,1}^n. Let Z_i be an indicator variable that is 1 if > > X_i > > > != > > > > > Y_i > > > > > > > and > > > > > > > > 0 > > > > > > > > > >> otherwise. Then d(X,Y)^2 =sum (X_i - Y_i)^2 = sum( > Z_i). > > > > Then > > > > > > the > > > > > > > > > expected > > > > > > > > > >> squared distance is E d(X,Y)^2 = sum( E Z_i) = sum( Pr[ > > X_i > > > != > > > > > > Y_i]) > > > > > > > = > > > > > > > > > n/2. > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Lance Norskog > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Lance Norskog > > > [email protected] > > > > > > > > > -- > Lance Norskog > [email protected] >
