Right, I forgot the relationship between error epsilon and log(n) part of
the theorem.

On Sat, Oct 22, 2011 at 8:39 PM, Ted Dunning <[email protected]> wrote:

> It is a valid distance, but it will not necessarily accurately reflect the
> original distance.  You need to preserve roughly log N dimensions to get
> something reasonably accurate relative to the original distance (with high
> probability).
>
> To the extent that your original vectors are colinear with your 1xN
> projection (which is just a vector, after all), your distances will be
> preserved.  This is the root of why a higher dimensional projection works
> ... there is very high probability that there is enough colinearity with
> something in the lower dimensional sub-space that your distance is OK.
>  This
> is particularly true if you aren't really sampling your vectors uniformly
> in
> the higher dimensional space as when you are projecting through some matrix
> A (as with SVD).
>
> On Sat, Oct 22, 2011 at 8:23 PM, Lance Norskog <[email protected]> wrote:
>
> > No, I have two n-d vectors and want to measure their "distance". Suppose
> I
> > project both via the same 1xN random projection matrix, and then find the
> > delta of the two values. Is this a valid distance? An approximate
> Manhattan
> > distance?
> >
> > On Sat, Oct 22, 2011 at 1:46 AM, Ted Dunning <[email protected]>
> > wrote:
> >
> > > I don't understand the question.
> > >
> > > What are the two vectors.  Suppose you have x, an N dimensional vector
> > and
> > > \Omega, a random 1xN projection.  \Omega x is a 1-dimensional vector.
> > >
> > > Where is the second 1-d vector?
> > >
> > > On Fri, Oct 21, 2011 at 8:23 PM, Lance Norskog <[email protected]>
> > wrote:
> > >
> > > > More completely: given an Nx1 random projection matrix, project any
> > > > N-dimensional vector to a 1-dimensional vector. The delta of two of
> > these
> > > > 1-d vectors gives a consistent difference, no?
> > > >
> > > > On Fri, Oct 21, 2011 at 9:35 AM, Ted Dunning <[email protected]>
> > > > wrote:
> > > >
> > > > > Well, it is missing a matrix to define the projections, but it is
> the
> > > > heart
> > > > > of the issue.
> > > > >
> > > > > It also lacks a proof which is kind of important among some
> circles.
> > >  The
> > > > > clever proof is actually not that hard to grok.
> > > > >
> > > > > On Fri, Oct 21, 2011 at 5:07 AM, Federico Castanedo <
> > > > > [email protected]
> > > > > > wrote:
> > > > >
> > > > > > I think, that's a good explanation of the Johnson-Lindenstrauss
> > > Lemma,
> > > > > > which
> > > > > > is the basis of the manifold learning theory using random
> > > projections.
> > > > > >
> > > > > > 2011/10/21 Ted Dunning <[email protected]>
> > > > > >
> > > > > > > Sort of.
> > > > > > >
> > > > > > > I may be misunderstanding the question.
> > > > > > >
> > > > > > > If you take a random orthogonal projection, then distances will
> > be
> > > > > > > preserved
> > > > > > > within a reasonably small epsilon to reasonably high
> probability.
> > > > > > >
> > > > > > > Mathematically, if you take a random matrix \Omega which is
> tall
> > > and
> > > > > > skinny
> > > > > > > and do a QR decomposition:
> > > > > > >
> > > > > > >      QR = \Omega
> > > > > > >
> > > > > > > Then Q is tall and skinny and Q^T projects vectors into a much
> > > lower
> > > > > > > dimensional space.
> > > > > > >
> > > > > > > If you take vectors x and y, then
> > > > > > >
> > > > > > >     |x - y| \approx |Q' x - Q'y|
> > > > > > >
> > > > > > > or, more precisely, we have, with high probability,
> > > > > > >
> > > > > > >     |x - y| - \epsilon \le |Q' x - Q'y| \le |x - y| + \epsilon
> > > > > > >
> > > > > > > This is very close to what you were saying, I think.
> > > > > > >
> > > > > > > On Thu, Oct 20, 2011 at 4:07 PM, Lance Norskog <
> > [email protected]>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Does this all translate to doing high-dimensional distance
> with
> > > > > random
> > > > > > > > projection? Project each vector to one dimension and
> subtract?
> > > This
> > > > > > > sounds
> > > > > > > > like a really useful distance measure.
> > > > > > > >
> > > > > > > > On Wed, Oct 19, 2011 at 7:32 PM, Ted Dunning <
> > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > The distribution of the dot product of two randomly chosen,
> > > > > uniformly
> > > > > > > > > distributed unit vectors is roughly normally distributed
> with
> > a
> > > > > > > standard
> > > > > > > > > deviation that declines with increasing dimension roughly
> > with
> > > > your
> > > > > > > > > observed
> > > > > > > > > sqrt scaling factor.
> > > > > > > > >
> > > > > > > > > In fact, it is just this scaling property that makes the
> > > > stochastic
> > > > > > SVD
> > > > > > > > > work
> > > > > > > > > with high probability of high accuracy.  The general
> property
> > > > that
> > > > > > > random
> > > > > > > > > unit vectors are nearly orthogonal is called
> > > > "quasi-orthogonality"
> > > > > > > > >
> > > > > > > > > On Wed, Oct 19, 2011 at 4:32 PM, Sean Owen <
> [email protected]
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Right, that's not quite the issue. It's that some
> > comparisons
> > > > are
> > > > > > > made
> > > > > > > > > > in 2-space, some in 10-space, etc. It would be nice to
> have
> > > > some
> > > > > > idea
> > > > > > > > > > that a distance is 2-space is "about as meaningfully far"
> > as
> > > > some
> > > > > > > > > > other distance in 10-space. I'm trying to find the order
> of
> > > > that
> > > > > > > > > > correcting factor and it seems to be sqrt(n). Within 2-
> or
> > > > > 10-space
> > > > > > > > > > indeed those distances aren't randomly distributed... but
> > > would
> > > > > > they
> > > > > > > > > > be so differently distributed as to change this factor?
> Gut
> > > > says
> > > > > > no,
> > > > > > > > > > but I have no more justification than that.
> > > > > > > > > >
> > > > > > > > > > On Wed, Oct 19, 2011 at 10:15 PM, Ted Dunning <
> > > > > > [email protected]
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > None of this actually applies because real data are not
> > > > > uniformly
> > > > > > > > > > > distributed (not even close).  Do the sampling on your
> > own
> > > > data
> > > > > > and
> > > > > > > > > pick
> > > > > > > > > > a
> > > > > > > > > > > good guess from that.
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Oct 19, 2011 at 11:40 AM, Sean Owen <
> > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> Ah, I'm looking for the distance between points
> within,
> > > > rather
> > > > > > > than
> > > > > > > > > > >> on, the hypercube. (Think of it as random rating
> > vectors,
> > > in
> > > > > the
> > > > > > > > range
> > > > > > > > > > >> 0..1, across all movies. They're not binary ratings
> but
> > > > > ratings
> > > > > > > from
> > > > > > > > 0
> > > > > > > > > > >> to 1.)
> > > > > > > > > > >>
> > > > > > > > > > >> On Wed, Oct 19, 2011 at 6:30 PM, Justin Cranshaw <
> > > > > > > > [email protected]>
> > > > > > > > > > >> wrote:
> > > > > > > > > > >> > I think the analytic answer should be sqrt(n/2).
> > > > > > > > > > >> >
> > > > > > > > > > >> > So let's suppose X and Y are random points in the n
> > > > > > dimensional
> > > > > > > > > > hypercube
> > > > > > > > > > >> {0,1}^n.  Let Z_i be an indicator variable that is 1
> if
> > > X_i
> > > > !=
> > > > > > Y_i
> > > > > > > > and
> > > > > > > > > 0
> > > > > > > > > > >> otherwise.  Then d(X,Y)^2 =sum (X_i - Y_i)^2 = sum(
> > Z_i).
> > > > >  Then
> > > > > > > the
> > > > > > > > > > expected
> > > > > > > > > > >> squared distance is E d(X,Y)^2 = sum( E Z_i) = sum(
> Pr[
> > > X_i
> > > > !=
> > > > > > > Y_i])
> > > > > > > > =
> > > > > > > > > > n/2.
> > > > > > > > > > >> >
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Lance Norskog
> > > > > > > > [email protected]
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Lance Norskog
> > > > [email protected]
> > > >
> > >
> >
> >
> >
> > --
> > Lance Norskog
> > [email protected]
> >
>



-- 
Lance Norskog
[email protected]

Reply via email to