Dr. Hammer, Please consider your courage credited. -ds A couple of points about PCA in general:
1) PCA makes no assumptions about the distribution (multivariate normal or otherwise) of your data. It is a procedure that simply produces the linear combinations of variables with maximum variance subject to orthogonality to other such axes. Distribution assumptions only come into play for (some) significance testing procedures. 2) PC1 will only identify size variation if size variation is the source of the greatest variation in your sample. Sex, species, habitat, etc. could all be determinants (not in the matrix sense 8-) ) of PC1 or some combination of these. In general, if you have data with some extreme outlier (e.g, transcription error), then the PC1 will (probably) just point to (or pi radians away from) the direction of that outlier relative to the main sample, which will still be the linear combination of maximum variance. What people often want PCA to do is either a) identify iso/allometry due to size variation in a sample or b) separate out sexes, species, or other groups. PCA is optimal for neither of these and could be quite misleading in both cases. If you are interested in size relationships, regress variables on some meaningful measure of size. If you are interested in group differences, look into CVA. If you have many more variables than specimens, you might do either of the above in a reduced PCA space if you check carefully to see if your limited data suggest you are capturing salient aspects of a space of reduced dimension resulting from the tight correlations amongst your variables. Otherwise, you must wave your hands vigorously before proceeding. See Marcus 1990 Blue Book chapter for a nice discussion of PCA and related methods. Books by Jackson and Joliffe and other authors specifically on Principal Components are available. -ds On Wed, 2004-05-19 at 09:29, [EMAIL PROTECTED] wrote: > Just a comment on this one, from a pragmatic point of view. > > It is of course true that PCA is only *guaranteed* to > produce components maximizing variance if you have > multivariate normality. The theory of PCA is based on this > assumption. But in many cases, PCA is used purely as a > visualization device, projecting a multivariate data set > onto a sheet of paper so we can see it. For visualization > of non-normal data, one could play around with different > techniques, such as PCA, PCO, NMDS, projection pursuit etc., > and then find that PCA does (or does not) perform well > for the given data set. There is no law against making > any linear combination you want of your variates, if it > reveals information. For example, PCA may be perfectly > adequate for resolving two well-separated groups, if > the within-group variance is relatively small. > > Of course, when using PCA for non-normal data one must > be a little careful and not over-interpret the results > (especially not the component loadings), but I think > it's too harsh to dismiss its use totally. > > I'm sure the hard-liners will flame me to pieces for > this email, but I hope they will at least give me > credit for my courage :-) > > > Dr. Oyvind Hammer > Geological Museum > University of Oslo > > > > > PCA Analysis assumes multivariate normality. > > > > Kathleen M. Robinette, Ph.D. > > Principal Research Anthropologist > > Air Force Research Laboratory > > > > == > Replies will be sent to list. > For more information see http://life.bio.sunysb.edu/morph/morphmet.html. -- Dennis E. Slice, Ph.D. Department of Biomedical Engineering Division of Radiologic Sciences Wake Forest University School of Medicine Winston-Salem, North Carolina, USA 27157-1022 Phone: 336-716-5384 Fax: 336-716-2870 == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html.
