Right, that's not quite the issue. It's that some comparisons are made
in 2-space, some in 10-space, etc. It would be nice to have some idea
that a distance is 2-space is "about as meaningfully far" as some
other distance in 10-space. I'm trying to find the order of that
correcting factor and it seems to be sqrt(n). Within 2- or 10-space
indeed those distances aren't randomly distributed... but would they
be so differently distributed as to change this factor? Gut says no,
but I have no more justification than that.

On Wed, Oct 19, 2011 at 10:15 PM, Ted Dunning <[email protected]> wrote:
> None of this actually applies because real data are not uniformly
> distributed (not even close).  Do the sampling on your own data and pick a
> good guess from that.
>
> On Wed, Oct 19, 2011 at 11:40 AM, Sean Owen <[email protected]> wrote:
>
>> Ah, I'm looking for the distance between points within, rather than
>> on, the hypercube. (Think of it as random rating vectors, in the range
>> 0..1, across all movies. They're not binary ratings but ratings from 0
>> to 1.)
>>
>> On Wed, Oct 19, 2011 at 6:30 PM, Justin Cranshaw <[email protected]>
>> wrote:
>> > I think the analytic answer should be sqrt(n/2).
>> >
>> > So let's suppose X and Y are random points in the n dimensional hypercube
>> {0,1}^n.  Let Z_i be an indicator variable that is 1 if X_i != Y_i and 0
>> otherwise.  Then d(X,Y)^2 =sum (X_i - Y_i)^2 = sum( Z_i).  Then the expected
>> squared distance is E d(X,Y)^2 = sum( E Z_i) = sum( Pr[ X_i != Y_i]) = n/2.
>> >
>> >
>>
>

Reply via email to