[R] non-uniqueness in cluster analysis

2003-12-03 Thread Bruno Giordano
Hi, I'm clustering objects defined by categorical variables with a hierarchical algorithm - average linkage. My distance matrix (general dissimilarity coefficient) includes several distances with exactly the same values. As I see, a standard agglomerative procedure ignores this problems, simply

Re: [R] non-uniqueness in cluster analysis

2003-12-03 Thread Thomas W Blackwell
Bruno - Many people add a tiny random number to each of the distances, or deliberately randomize the input order. This means that any clustering is not reproducible, unless you go back to the original randoms, but it forces you not to pay attention to minor differences. Ah, I think you're

Re: [R] non-uniqueness in cluster analysis

2003-12-03 Thread Prof Brian Ripley
On Wed, 3 Dec 2003, Bruno Giordano wrote: Hi, I'm clustering objects defined by categorical variables with a hierarchical algorithm - average linkage. My distance matrix (general dissimilarity coefficient) includes several distances with exactly the same values. As I see, a standard

Re: [R] non-uniqueness in cluster analysis

2003-12-03 Thread Christian Hennig
Hi, Brian Ripley already replied don't use average linkage... You may think about k-medoid (pam) in package cluster instead. However, often average linkage is not such a bad choice, and if you really want to use it for your data, you may try the following: Among the hierarchical methods, single

Re: [R] non-uniqueness in cluster analysis

2003-12-03 Thread Bruno Giordano
What I did was, in presence of equal values distances, to randomize the selection of them, and compute the distortion of the solution using cophenetic correlation. I computed 1 random trees for each of three methods: average, single and complete linkage. Among the randomly selected solutions,