In Manhattan and elsewhere, the streets and avenues are not symmetric:
the avenues are much farther apart than are the streets. This means that
W 53rd St. & 8th Ave is much farther from W 53rd & 7th than it is from W
52nd & 8th. A distance metric that treats all dimensions as equal would
be off by a factor of about 2.5. A weighted distance metric that knew of
this difference would produce distance values - and hence clusters -
that more closely matched the real world.

 

Generalizing this to n-d, the new distance metric might look like this:

 

distance = sum(abs(p2[i] - p1[i]) * s[i] ) where S = a vector of
(positive) scale factors.

 

Would this be an appropriate new clustering feature? 

Jeff

Reply via email to