Hmm. Not knowing the analytics answer I just wrote a simulation. sqrt(n / 3) looks like a shockingly good fit for the average distance between two randomly chosen points in the n-dimensional hypercube.
Accident? error? known result? Seems clear that something like sqrt(n) would be a better factor than n. But, indeed, there are yet more possibilities with exponential functions. On Wed, Oct 19, 2011 at 4:28 PM, Justin Cranshaw <[email protected]> wrote: > I've most often seen something like exp(-d(x,y)) for converting distance to > similarity. Unlike 1/(1+d) this has exponential decay in distance, which is > usually more desirable. There is a similar kludge to what you describe, > where people use exp(-d/h) for some bandwidth h. I'm not sure there's an > standard way of picking h though. I've seem people use something like a > sample variance from the data. >
