Hmm. Not knowing the analytics answer I just wrote a simulation.
sqrt(n / 3) looks like a shockingly good fit for the average distance
between two randomly chosen points in the n-dimensional hypercube.

Accident? error? known result? Seems clear that something like sqrt(n)
would be a better factor than n. But, indeed, there are yet more
possibilities with exponential functions.


On Wed, Oct 19, 2011 at 4:28 PM, Justin Cranshaw <[email protected]> wrote:
> I've most often seen something like exp(-d(x,y)) for converting distance to 
> similarity.  Unlike 1/(1+d) this has exponential decay in distance, which is 
> usually more desirable.  There is a similar kludge to what you describe, 
> where people use exp(-d/h) for some bandwidth h.  I'm not sure there's an 
> standard way of picking h though.  I've seem people use something like a 
> sample variance from the data.
>

Reply via email to