On 9 Dec 2003 07:54:18 -0800, [EMAIL PROTECTED] (Sangdon Lee) wrote: > Dear All, > > I remember but could not find references showing the relationship > between the Mahalanobis distance and principal component analysis. I > appreciate if anybody explain or give references.
The M. distance takes into account correlation. That is its special quality. The correlations are zero after principal components analysis, so that wipes out the relevance. > > Also, I'm wondering what is the right way of clustering observations > when variables are highly collinear? Well, what do you want to achieve? You seem to be assuming that "clustering" is an official set of techniques with registered and regulated modes. That's even less true for clustering than it is for 'factor analysis'. That is, there are not as many 'real statisticians' who do cluster analysis or write about it, compared to most things that get mentioned. And, when someone does write about clustering, it might be about a singular method that is designed for one end -- So, that does not do much to further the practice of clustering, either. If you want to use 'normal' and squared distances, accounting for correlation, why don't you stick with factor analysis, since FA starts with that metric. FA is less ad-hoc, and more reputable. > 1) Run PCA and use all of principal components for cluster analysis > 2) Use the Mahalanobis distance. > > By the way, why the Mahalanobis distance is not included in books for > cluster analysis and also major softwares such as SAS, SPSS or > Minitab? I usually work on data where many variables are collinear > and have to include those variables. My inexperienced thought in - see above. If you don't want to *use* its idiosyncratic scaling, and the potential to overweight one part of the algorithm, why are you using Clustering? (I've never been an advocate of Clustering, but I am willing to hear other characterizations if someone wants to offer them....) > cluster analysis is that it would be better to use the Mahalanobis > distance if variables are collinear but most softwares do not include > it. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html "Taxes are the price we pay for civilization." . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
