None of this actually applies because real data are not uniformly distributed (not even close). Do the sampling on your own data and pick a good guess from that.
On Wed, Oct 19, 2011 at 11:40 AM, Sean Owen <[email protected]> wrote: > Ah, I'm looking for the distance between points within, rather than > on, the hypercube. (Think of it as random rating vectors, in the range > 0..1, across all movies. They're not binary ratings but ratings from 0 > to 1.) > > On Wed, Oct 19, 2011 at 6:30 PM, Justin Cranshaw <[email protected]> > wrote: > > I think the analytic answer should be sqrt(n/2). > > > > So let's suppose X and Y are random points in the n dimensional hypercube > {0,1}^n. Let Z_i be an indicator variable that is 1 if X_i != Y_i and 0 > otherwise. Then d(X,Y)^2 =sum (X_i - Y_i)^2 = sum( Z_i). Then the expected > squared distance is E d(X,Y)^2 = sum( E Z_i) = sum( Pr[ X_i != Y_i]) = n/2. > > > > >
