Interesting question came up recently about using the Euclidean distance d between two vectors as a notion of their similarity.
You can use 1 / (1 + d), which mostly works, except that it 'penalizes' larger vectors, who have more dimensions along which to differ. This is bad when those vectors are the subsets of user pref data in which two users overlap: more overlap ought to mean higher similarity. I have an ancient, bad kludge in there that uses n / (1 + d), where n is the size of the two vectors. It's trying to normalize away the average distance between randomly-chosen vectors in the space (remember that each dimension is bounded, between min and max rating). But that's not n. Is there a good formula or way of thinking about what that number should be? I can't find it on the internet.
