-------- Original Message --------
Subject: Re: PCA & CVA questions
Date: Thu, 4 Jun 2009 06:54:20 -0700 (PDT)
From: andrea cardini <[email protected]>
To: [email protected]
Dear Kim,
please, find my answers below.
At 08:58 04/06/2009 -0400, you wrote:
-------- Original Message --------
Subject: PCA & CVA questions
Date: Wed, 3 Jun 2009 19:25:10 -0700 (PDT)
From: Kimberly Tice <[email protected]>
To: [email protected]
Hi,
I am new to the morphometrics world, and am having a bit of trouble
figuring out what tests to use to analyze my data and how to interpret
them. I was hoping someone might be able to offer a bit of advice...
I have two types of snails, and I want to determine if they are
different shapes. I've been using the IMP programs, and when I perform
a principal components analysis, I have 1 distinct principal component,
but the two different groups are almost completely overlapping along
that PC. I did a MANOVA of all of the partial warps/uniform warps, but
in this case, I found that the groups were significantly different. How
do I reconcile this with the PCA?
MANOVA and CVA/DA maximize between group variance relative to within group
variance in order to test significance. PCA maximizes the total variance in
your sample regardless of groups in order to summarize variation. The fact
that they have different purposes may explain why sometimes you may get
different answers. This is particularly likely to happen if group
differences are small and do not align with the main direction of total
sample variance.
Be careful as MANOVA/CVA/DA tend to overfit the data. With MANOVA, compute
the % of variance explained by group differences. This is easy to do in
TPSRegr by regressing shape variables onto a dummy variable for the two
groups (code -1 the first group and 1 the second one). This % could also be
bootstrapped to get confidence intervals. With CVA/DA I'd compute
cross-validated % of correctly classified specimens according to group.
When CVA/DA overfit the data, the cross-validated % drop. This can give you
some clues about the magnitude of your differences.
Is there any way to determine whether
the differences in the MANOVA are "biologically significant" or exactly
what the shape differences are?
It depends on how much variance these differences explain and whether
differences are biologically meaningful. This second issue depend on
whether you're able to interpret differences that you find based on your a
priori knowledge, experiments, further analyses etc. For instance, in a
sample of young human adults, same nationality, all university students, I
found significant differences in face shape between sexes. These explained
just 5% of total variance. However, differences in mean shapes (mean female
vs mean male) were exactely as expected based on what we know on sexual
dimorphism in humans (e.g., female with relatively smaller chin and less
prominent jaw, smaller nose etc.). This makes statistical significance much
more convincing and interesting, even if the effect size (magnitude of
differences) is small.
To visualize mean shape differences between your samples you can, again,
use TPSRegr. After the regression on the dummy variable, you can warp
differences between the two means. Just skip everything in the output which
has to do with partial warps analysed one a time!
You can do these same analyses and also the cross-validated CVA/DA in
MorphoJ. In many programs you can also have versions of these tests based
on resampling statistics (e.g., pairwise permutation tests for mean shape
differences). This is certainly avaialble in MorphoJ and TPSRegr (just
click on the permutation button).
Probably many of these analyses can also be done in IMP, but I am a bit
less familiar with this series of software. Other people on the list can
advise you better.
Is it appropriate to do an ANOVA on one
principle component?
I also did a CVA, and found 1 significant CV, with an eigenvalue of
0.8. What does the eigenvalue mean?
You have only two groups. Thus, only 1 axis which maximize their
differences. If you had 3 groups, you'd have 2 eigenvalues and 2 CV axes
(and so on for more than 3 groups), and those eigenvalues would tell you
the amount of group differences explained by each CV (say, CV1 80% and CV2
20% or CV1 60% and CV2 40% etc.). This is not the same as the % of total
variance explained by group differences that I mentioned above. For
instance, in my human sexual dimorphism example, as in your case, there's
only 1 CV and this explains 100% of group differences but this axis is only
5% of total (i.e., regardless of group) variance.
NB CVA, DA, MANOVA and multivariate regression on dummy variables for
groups are all doing the same thing (maximize between to within group
differences). Check the output of the corresponding multivariate test
(e.g., Wilks lambda) and you'll see that it's identical. These different
methods belong to the same family and just focus on slightly different
aspects (e.g., MANOVA is just about testing and DA/CVA are also about
classifying).
Is there any way to determine how
important this CV is in terms of the amount of variation it explains?
See above.
I know these are relatively basic statistics questions, but these tests
are new to me. I'd really appreciate any advice you might have or
information about resources that might be helpful.
You can find applications of these methods in a few of my own studies and
in many other ones. Most of my pdf can be found following the links in the
electronic signature. The guenon paper in the Journal of Human Evolution,
for instance, has cross-validated DAs, % of variance explained by group
differences etc. There also a few more recent ones that are in press and I
can send you (red colobus taxonomy, Vancouver Island marmot subfossils
etc.).
In general, there's plenty of good introductory textbooks on multivariate
statistics. A few references (but there's really plenty) are below.
Good luck!
Cheers
Andrea
STATS (please, notice that there might be a more recent editions!)
- Moore & McCabe, 1998. Introduction to the practice of statistics. Third
Edition. New York, Freeman & Company. (NB there's a few book chapters with
quite a bit of stuff on resampling etc. which are avaialable for free on
the internet)
- Grafen & Hails, 2002. Modern Statistics for the Life Sciences, first
edition. Oxford University Press.
- Hair, Anderson & Tatham R. L., Black W. C., 1998. Multivariate data
analysis. Prentice Hall, Upper Saddle River.
- Stevens, 2002. Applied multivariate statistics for the social sciences.
LEA (publishers).
- Manly, 1997. Randomisation, Bootstrap and Monte Carlo Methods in
Biology.
Chapman & Hall.
IMPORTANT REFS ON APPLICATIONS OF 'STANDARD' MULTIVARIATE METHODS TO
GEOMETRIC MORPHOMETRIC DATA
- Rohlf, 1998. On applications of geometric morphometrics to study of
ontogeny and phylogeny. Syst. Biol., 47: 147-158
- Zelditch et al. 2004. Geometric morphometrics for biologists: a primer.
Elsevier Academic Press.
- Klingenberg & Monteiro, 2005. Distances and directions in
multidimensional shape spaces: implications for morphometric applications.
Syst. Biol., 54:678-688
HELP FILES
Most programs (MorphoJ, TPS series, NTSYS etc.) have excellent help files
with examples.
Thank you!
Kim
[email protected] <mailto:[email protected]>
--
Replies will be sent to the list.
For more information visit http://www.morphometrics.org
Dr. Andrea Cardini
Lecturer in Animal Biology
Museo di Paleobiologia e dell'Orto Botanico, Universitá di Modena e Reggio
Emilia
via Università 4, 41100, Modena, Italy
tel: 0039 059 2056532; fax: 0039 059 2056535
Honorary Fellow
Functional Morphology and Evolution Unit, Hull York Medical School
University of Hull, Cottingham Road, Hull, HU6 7RX, UK
University of York, Heslington, York YO10 5DD, UK
E-mail address: [email protected], [email protected],
[email protected]
http://hyms.fme.googlepages.com/drandreacardini
http://ads.ahds.ac.uk/catalogue/archive/cerco_lt_2007/overview.cfm#metadata
More on publications at:
http://www.cons-dev.org/marm/MARM/EMARM/framarm/framarm.html
CLICK ON THE LETTER C AND LOOK FOR "CARDINI" (p. 8-9 until March 2009)
http://hyms.fme.googlepages.com/dr.sarahelton-publications
LOOK FOR "CARDINI"
--
Replies will be sent to the list.
For more information visit http://www.morphometrics.org