Hi,
I'm clustering objects defined by categorical variables with a hierarchical
algorithm - average linkage.
My distance matrix (general dissimilarity coefficient) includes several
distances with exactly the same values.
As I see, a standard agglomerative procedure ignores this problems, simply
Bruno -
Many people add a tiny random number to each of the distances,
or deliberately randomize the input order. This means that
any clustering is not reproducible, unless you go back to the
original randoms, but it forces you not to pay attention to
minor differences.
Ah, I think you're
On Wed, 3 Dec 2003, Bruno Giordano wrote:
Hi,
I'm clustering objects defined by categorical variables with a hierarchical
algorithm - average linkage.
My distance matrix (general dissimilarity coefficient) includes several
distances with exactly the same values.
As I see, a standard
Hi,
Brian Ripley already replied don't use average linkage... You
may think about k-medoid (pam) in package cluster instead.
However, often average linkage is not such a bad choice, and if you really
want to use it for your data, you may try the following:
Among the hierarchical methods, single
What I did was, in presence of equal values distances, to randomize the
selection of them, and compute the distortion of the solution using
cophenetic correlation.
I computed 1 random trees for each of three methods: average, single
and complete linkage.
Among the randomly selected solutions,