On 9 Dec 2003 07:54:18 -0800, [EMAIL PROTECTED] (Sangdon Lee)
wrote:

> Dear All,
> 
> I remember but could not find references showing the relationship
> between the Mahalanobis distance and principal component analysis.  I
> appreciate if anybody explain or give references.

The M.  distance takes into account correlation.
That is its special quality.  The correlations are zero 
after principal components analysis, so that wipes out
the relevance.  

> 
> Also, I'm wondering what is the right way of clustering observations
> when variables are highly collinear?

Well, what do you want to achieve?

You seem to be assuming that "clustering"  is an
official  set of techniques with registered and
regulated modes.  That's even less true for clustering 
than it is for 'factor analysis'.  That is, there are not as 
many 'real statisticians'  who do cluster analysis or 
write about it, compared to most things that get mentioned.  

And, when someone does write about clustering, it
might be about a singular method that is designed
for one end -- So, that does not do much to further
the practice of clustering, either.  

If you want to use 'normal' and squared distances, 
accounting for correlation, why don't you stick with
factor analysis, since FA  starts with that metric.  
FA is  less ad-hoc,  and more reputable.

> 1) Run PCA and use all of principal components for cluster analysis
> 2) Use the Mahalanobis distance.  
> 
> By the way, why the Mahalanobis distance is not included in books for
> cluster analysis and also major softwares such as SAS, SPSS or
> Minitab?  I usually work on data where many variables are collinear
> and have to include those variables.  My inexperienced thought in

 - see above.  If you don't  want to *use*  its idiosyncratic 
scaling, and the potential to overweight one part of the 
algorithm, why are you using Clustering?

(I've never been an advocate of Clustering, but I am willing 
to hear other characterizations if someone wants to 
offer them....)

> cluster analysis is that it would be better to use the Mahalanobis
> distance if variables are collinear but most softwares do not include
> it.


-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
"Taxes are the price we pay for civilization." 
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to