Sort of.
I may be misunderstanding the question.
If you take a random orthogonal projection, then distances will be preserved
within a reasonably small epsilon to reasonably high probability.
Mathematically, if you take a random matrix \Omega which is tall and skinny
and do a QR decomposition:
QR = \Omega
Then Q is tall and skinny and Q^T projects vectors into a much lower
dimensional space.
If you take vectors x and y, then
|x - y| \approx |Q' x - Q'y|
or, more precisely, we have, with high probability,
|x - y| - \epsilon \le |Q' x - Q'y| \le |x - y| + \epsilon
This is very close to what you were saying, I think.
On Thu, Oct 20, 2011 at 4:07 PM, Lance Norskog <[email protected]> wrote:
> Does this all translate to doing high-dimensional distance with random
> projection? Project each vector to one dimension and subtract? This sounds
> like a really useful distance measure.
>
> On Wed, Oct 19, 2011 at 7:32 PM, Ted Dunning <[email protected]>
> wrote:
>
> > The distribution of the dot product of two randomly chosen, uniformly
> > distributed unit vectors is roughly normally distributed with a standard
> > deviation that declines with increasing dimension roughly with your
> > observed
> > sqrt scaling factor.
> >
> > In fact, it is just this scaling property that makes the stochastic SVD
> > work
> > with high probability of high accuracy. The general property that random
> > unit vectors are nearly orthogonal is called "quasi-orthogonality"
> >
> > On Wed, Oct 19, 2011 at 4:32 PM, Sean Owen <[email protected]> wrote:
> >
> > > Right, that's not quite the issue. It's that some comparisons are made
> > > in 2-space, some in 10-space, etc. It would be nice to have some idea
> > > that a distance is 2-space is "about as meaningfully far" as some
> > > other distance in 10-space. I'm trying to find the order of that
> > > correcting factor and it seems to be sqrt(n). Within 2- or 10-space
> > > indeed those distances aren't randomly distributed... but would they
> > > be so differently distributed as to change this factor? Gut says no,
> > > but I have no more justification than that.
> > >
> > > On Wed, Oct 19, 2011 at 10:15 PM, Ted Dunning <[email protected]>
> > > wrote:
> > > > None of this actually applies because real data are not uniformly
> > > > distributed (not even close). Do the sampling on your own data and
> > pick
> > > a
> > > > good guess from that.
> > > >
> > > > On Wed, Oct 19, 2011 at 11:40 AM, Sean Owen <[email protected]>
> wrote:
> > > >
> > > >> Ah, I'm looking for the distance between points within, rather than
> > > >> on, the hypercube. (Think of it as random rating vectors, in the
> range
> > > >> 0..1, across all movies. They're not binary ratings but ratings from
> 0
> > > >> to 1.)
> > > >>
> > > >> On Wed, Oct 19, 2011 at 6:30 PM, Justin Cranshaw <
> [email protected]>
> > > >> wrote:
> > > >> > I think the analytic answer should be sqrt(n/2).
> > > >> >
> > > >> > So let's suppose X and Y are random points in the n dimensional
> > > hypercube
> > > >> {0,1}^n. Let Z_i be an indicator variable that is 1 if X_i != Y_i
> and
> > 0
> > > >> otherwise. Then d(X,Y)^2 =sum (X_i - Y_i)^2 = sum( Z_i). Then the
> > > expected
> > > >> squared distance is E d(X,Y)^2 = sum( E Z_i) = sum( Pr[ X_i != Y_i])
> =
> > > n/2.
> > > >> >
> > > >> >
> > > >>
> > > >
> > >
> >
>
>
>
> --
> Lance Norskog
> [email protected]
>