No, I have two n-d vectors and want to measure their "distance". Suppose I project both via the same 1xN random projection matrix, and then find the delta of the two values. Is this a valid distance? An approximate Manhattan distance?
On Sat, Oct 22, 2011 at 1:46 AM, Ted Dunning <[email protected]> wrote: > I don't understand the question. > > What are the two vectors. Suppose you have x, an N dimensional vector and > \Omega, a random 1xN projection. \Omega x is a 1-dimensional vector. > > Where is the second 1-d vector? > > On Fri, Oct 21, 2011 at 8:23 PM, Lance Norskog <[email protected]> wrote: > > > More completely: given an Nx1 random projection matrix, project any > > N-dimensional vector to a 1-dimensional vector. The delta of two of these > > 1-d vectors gives a consistent difference, no? > > > > On Fri, Oct 21, 2011 at 9:35 AM, Ted Dunning <[email protected]> > > wrote: > > > > > Well, it is missing a matrix to define the projections, but it is the > > heart > > > of the issue. > > > > > > It also lacks a proof which is kind of important among some circles. > The > > > clever proof is actually not that hard to grok. > > > > > > On Fri, Oct 21, 2011 at 5:07 AM, Federico Castanedo < > > > [email protected] > > > > wrote: > > > > > > > I think, that's a good explanation of the Johnson-Lindenstrauss > Lemma, > > > > which > > > > is the basis of the manifold learning theory using random > projections. > > > > > > > > 2011/10/21 Ted Dunning <[email protected]> > > > > > > > > > Sort of. > > > > > > > > > > I may be misunderstanding the question. > > > > > > > > > > If you take a random orthogonal projection, then distances will be > > > > > preserved > > > > > within a reasonably small epsilon to reasonably high probability. > > > > > > > > > > Mathematically, if you take a random matrix \Omega which is tall > and > > > > skinny > > > > > and do a QR decomposition: > > > > > > > > > > QR = \Omega > > > > > > > > > > Then Q is tall and skinny and Q^T projects vectors into a much > lower > > > > > dimensional space. > > > > > > > > > > If you take vectors x and y, then > > > > > > > > > > |x - y| \approx |Q' x - Q'y| > > > > > > > > > > or, more precisely, we have, with high probability, > > > > > > > > > > |x - y| - \epsilon \le |Q' x - Q'y| \le |x - y| + \epsilon > > > > > > > > > > This is very close to what you were saying, I think. > > > > > > > > > > On Thu, Oct 20, 2011 at 4:07 PM, Lance Norskog <[email protected]> > > > > wrote: > > > > > > > > > > > Does this all translate to doing high-dimensional distance with > > > random > > > > > > projection? Project each vector to one dimension and subtract? > This > > > > > sounds > > > > > > like a really useful distance measure. > > > > > > > > > > > > On Wed, Oct 19, 2011 at 7:32 PM, Ted Dunning < > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > The distribution of the dot product of two randomly chosen, > > > uniformly > > > > > > > distributed unit vectors is roughly normally distributed with a > > > > > standard > > > > > > > deviation that declines with increasing dimension roughly with > > your > > > > > > > observed > > > > > > > sqrt scaling factor. > > > > > > > > > > > > > > In fact, it is just this scaling property that makes the > > stochastic > > > > SVD > > > > > > > work > > > > > > > with high probability of high accuracy. The general property > > that > > > > > random > > > > > > > unit vectors are nearly orthogonal is called > > "quasi-orthogonality" > > > > > > > > > > > > > > On Wed, Oct 19, 2011 at 4:32 PM, Sean Owen <[email protected]> > > > wrote: > > > > > > > > > > > > > > > Right, that's not quite the issue. It's that some comparisons > > are > > > > > made > > > > > > > > in 2-space, some in 10-space, etc. It would be nice to have > > some > > > > idea > > > > > > > > that a distance is 2-space is "about as meaningfully far" as > > some > > > > > > > > other distance in 10-space. I'm trying to find the order of > > that > > > > > > > > correcting factor and it seems to be sqrt(n). Within 2- or > > > 10-space > > > > > > > > indeed those distances aren't randomly distributed... but > would > > > > they > > > > > > > > be so differently distributed as to change this factor? Gut > > says > > > > no, > > > > > > > > but I have no more justification than that. > > > > > > > > > > > > > > > > On Wed, Oct 19, 2011 at 10:15 PM, Ted Dunning < > > > > [email protected] > > > > > > > > > > > > > > wrote: > > > > > > > > > None of this actually applies because real data are not > > > uniformly > > > > > > > > > distributed (not even close). Do the sampling on your own > > data > > > > and > > > > > > > pick > > > > > > > > a > > > > > > > > > good guess from that. > > > > > > > > > > > > > > > > > > On Wed, Oct 19, 2011 at 11:40 AM, Sean Owen < > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > > > > > >> Ah, I'm looking for the distance between points within, > > rather > > > > > than > > > > > > > > >> on, the hypercube. (Think of it as random rating vectors, > in > > > the > > > > > > range > > > > > > > > >> 0..1, across all movies. They're not binary ratings but > > > ratings > > > > > from > > > > > > 0 > > > > > > > > >> to 1.) > > > > > > > > >> > > > > > > > > >> On Wed, Oct 19, 2011 at 6:30 PM, Justin Cranshaw < > > > > > > [email protected]> > > > > > > > > >> wrote: > > > > > > > > >> > I think the analytic answer should be sqrt(n/2). > > > > > > > > >> > > > > > > > > > >> > So let's suppose X and Y are random points in the n > > > > dimensional > > > > > > > > hypercube > > > > > > > > >> {0,1}^n. Let Z_i be an indicator variable that is 1 if > X_i > > != > > > > Y_i > > > > > > and > > > > > > > 0 > > > > > > > > >> otherwise. Then d(X,Y)^2 =sum (X_i - Y_i)^2 = sum( Z_i). > > > Then > > > > > the > > > > > > > > expected > > > > > > > > >> squared distance is E d(X,Y)^2 = sum( E Z_i) = sum( Pr[ > X_i > > != > > > > > Y_i]) > > > > > > = > > > > > > > > n/2. > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Lance Norskog > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Lance Norskog > > [email protected] > > > -- Lance Norskog [email protected]
