A couple of observations:

Using a correlation matrix rather than a covariance matrix has nothing to do
with whether the data are normally distributed or not. One usually wants to
use a covariance matrix. However if the variables are in various units that
cannot be made consistent then one "gives up" and uses a correlation matrix.

The main issue about using some of the other methods that were suggested is
whether the groups (clusters) are a priori defined or are groups you are
trying to discover in the data. If you know the groups in advance then it
makes sense to consider CVA, CPCA, manova, etc. You then will run into the
problem that I mentioned in my prior message - you need more observations
than variables. 

The comments about normality were a bit off the point. With data such as you
describe there is no expectation that the entire data set be consistent with
a multivariate normal distribution. What you want is for the distributions
within the clusters to be normal. For those your sample sizes will be even
smaller so it is difficult to perform serious tests with data such as yours.
Since you want to find clusters of species you really do not want your
entire dataset to be consistent with sampling from a single normally
distributed population.

CVA = canonical variates analysis
NMDS = nonmetric multidimensional scaling analysis. 

NMDS would be a good thing to try on your data. It is similar to a PCA
ordination but is not constrained to the axes being linear functions of your
original variables. It usually does a better job of summarizing distances
between points in a low dimensional space.
 
-------------------------------------------------
F. James Rohlf -SUNY Stony Brook, NY 11794-5245
FAX: 1-631-632-7626 www: http://life.bio.sunysb.edu/ee/rohlf

 

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, May 26, 2004 1:08 AM
> To: [EMAIL PROTECTED]
> Subject: Re: size correction & discriminant functions analyses
> 
> G'day all,
> Sender: [EMAIL PROTECTED]
> Precedence: bulk
> Reply-To: [EMAIL PROTECTED]
> 
> Thanks to everyone for your comments. They've been a great 
> help, and I'm glad that my question sparked a bit of 
> discussion on the subject.
> 
> After some pondering, I've got a few more questions and some 
> more details on the way I analysed my data. Although I was 
> looking for species clustering, I wasn't terribly concerned 
> with quantifying any clustering, and was using PCA more as a 
> visualisation technique to explore my data. In the future I 
> will try the various methods suggested to try to quantify the 
> clustering.
> 
> Another thing was with regards to the issue of multivariate 
> normality. I did not use a variance-covariance matrix, 
> instead I used a correlation matrix. I was under the 
> assumption that by transforming the covariances into 
> z-scores, I would have a greater chance of my data being (or
> approaching) multivariate normality? Also, for testing if my 
> data is normally distributed, if I was to do separate PCA's 
> for each population and if a population was normally dist., 
> then would I expect to see an ellipsoid with it's greatest 
> length along PC1 in a PCA plot?
> 
> With regards to obtaining singular matrices when # measures 
> >> # specimens, this did happen to me and the way I 'got 
> round' this was to first regress every measurement against 
> total length and then by looking at the slopes of the 
> regressions, chose which measurements showed the greatest 
> potential for between species differentiation. Because I was 
> using PCA just as a qualitative tool, I didn't think it was 
> much of a problem, however if I want to do quantitative 
> analysis such as discriminant analysis, can I still use this 
> same method of choosing measures, or am I restricted to 
> stepwise methods using the whole data set?
> 
> Forgive my ignorance, but what is NMDS and CVA? I assume PCO 
> is principal coordinates analysis? I would also appreciate a 
> pdf of the Darroch & Mosimann paper if available.
> 
> A final point, to perhaps spark more debate or at least to 
> motivate some thought, is that I have found it very difficult 
> to get a basic understanding of the application of 
> multivariate stats to morphometrics because the text books 
> available are very technical. An equation may be meaningful 
> to the gurus, but it doesn't mean a whole lot to me. It is 
> also one thing to describe how a procedure works, but it's 
> another thing to implement it when you are ignorant of the 
> software availble. I think there is a great need for a text 
> book that can introduce the new student to this field without 
> using equations to describe what's going on. There
> - I've said it, let the slaughter begin.
> 
> Thanks,
> 
> Brett
> 
> *****************************
> Brett Human
> Shark Researcher
> 27 Southern Ave
> West Beach SA 5024
> Australia
> +61 8 8356 6891
> [EMAIL PROTECTED]
> *****************************
> ==
> Replies will be sent to list.
> For more information see 
> http://life.bio.sunysb.edu/morph/morphmet.html.
> 

==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.

Reply via email to