-------- Original Message --------
Subject: Canonical variates from first PCs of GPA residuals
Date: Tue, 17 Feb 2009 05:16:25 -0800 (PST)
From: Julio Rojo <[email protected]>
To: [email protected]
References: <[email protected]>
Dear morphometricians:
I am a little bit confused... and I would like to understand better this
topic.
In my case, I have measured few skulls from different places of a single
species, and I want to know if there is morphological differences and/or
morphological groups related to its geographic origin. Long time ago,
some subspecies have been described, but from one or two skulls by
subspecies or from some skins... I thought that first principal
components could help me to see if main sample variation could be
related to geographical origin (or any other variable) of the specimens
or not, or to reduces the number of variables to be included after in a
statistical test.
What should I do?... find group differences wherever it will be
(full-sample CVA with old subspecies assignments or creating new
biogeographic variables)? or take a look if the main variation of my
sample is related to geographical origin, for example, plotting first
PCAs with biogeographic-origin labeled scores? The problem is that I
have few skulls of some "traditional" groups, like the case that begun
this topic.
I understand PCA (i.e. Relative Warps) as a rotation of variables
(Partial Warps with Uniform comp) that creates new variables where
sample variance is maximized from first (that contains highest amount of
sample variance) to the last (lowest amount) component. Then, if I
select only first principal components -that shows highest variances-
from a PCA, I could lose information about variation that could be -or
not- important to discriminate groups, but these "lost" variance are not
the main variation of my sample.
For example, it could happen that there is a small variation in one
morphological trait highly related to geographical origin, but this
variation could not be highly reflected in first PCs... because PCA is
not a CVA, and other variables could show higher variances, this trait
that discriminates groups couldn't plays an important role in total
variance, and therefore wouldn't affect first principal components.
I understand that, about select first PCs from a full-sample PCA, Dennis
Slice argues that there is a nonrandom sample if I know that there is
more than one group in these sample, but I don't know really how many
groups are included in my sample and if the old subspecies assignments
are the "real" group assignments. If I choose to do the pooled
Within-group PCA method for dimensions reduction, could I lose
information if group assignments are not true, same as I told above
about "all-sample" PCA analysis? I would like to know more about pooled
Within-group PCA, how it "works" and how to do it (can I do it, for
example,in PAST software?).
Other big problem appears if I want to test group differences with any
statistic test, because I couldn't use the same sample where I have
found, selected, choose... groups, and therefore there is no "a priori"
group definitions (for this data), a basic statement to any statistical
analysis to test group differences.
Sorry about my english and thank you, I have learned a lot with your
comments!!
Cheers
Julio Rojo
Doctorate Student in Conservation Biology
Universidad Complutense de Madrid
Spain
[email protected]
morphmet escribió:
-------- Original Message --------
Subject: Re: [Fwd: Re: Canonical variates from first PCs of GPA
residuals]
Date: Fri, 13 Feb 2009 12:50:30 -0800 (PST)
From: Dennis E. Slice <[email protected]>
To: [email protected]
References: <[email protected]>
morphmet wrote:
-------- Original Message --------
Subject: Re: [Fwd: Re: Canonical variates from first PCs of GPA
residuals]
Date: Thu, 12 Feb 2009 12:11:37 -0800 (PST)
From: Pedro Cordeiro Estrela <[email protected]>
Reply-To: [email protected]
To: [email protected]
Dear All and Dennis Slice,
what exactly could be "misleading"?
The misleading part is believing any p-values for tests of group
differences based on variables (PCs) that may/probably enhance group
differences. The differences, themselves, may contribute to variance
and attract the low PCs. Hence, you are in a statistical position akin
to looking at your groups and testing those that are *most* different
using models that assume you didn't select the groups because of their
impressive differences.
...snip...
Dennis, I did not understand your sentence: "Note, the problem is
that overall PCA with grouped data can only be used for dimension
reduction for visualization - there is no statistical model. You can
do something, perhaps, with PCs from a single-group PCA, or use
within-group PCs for dimension reduction and still examine between
group differences."
Most any discussion of PCA should include an admonition along the
lines of "Their development does not require a multivariate normal
assumption. On the other hand, principal components derived for
multivariate normal populations have useful interpretations in terms
of constant density ellipsoids." (taken from Johnson and Wichern.
1982. Applied Multivariate Statistical Analysis, p 362.)
If you are examining a single multivariate population, you can think
of PCA as revealing some parametric aspect of the
correlation/covariance structure. But, PCA does not require this. You
can do it on data consisting of samples from multiple groups with any
distribution. As soon as you do this, however, you have violated the
most basic assumption of a random sample (unless you can argue the
species, say, were constructed and sampled at random, which they ever
are).
So, none of the traditional tests for group differences based on
anything constructed from PCs of nonrandom samples are appropriate.
Even nonparametric tests require random sampling.
But, PCA does give you the best (in the variance maximizing sense)
lower dimensional representation of your higher dimensional nonrandom
sample.
That is fine for looking at through a glass less darkly.
PCA of pooled, within-group covariance is different. Here, each sample
is assumed to be random sample and, maybe, even a multivariate normal
one. A key assumption of what is to follow is that all groups,
species, etc. have a common covariance structure. If that is the case,
then subtracting off the mean is just a variable-wise additive coding
and does not affect the covariance structure. Pooling thus gives an
even better estimate of the covariance structure shared by all groups.
The first one or two or ten of these, then, are linear combinations of
the original variables that give the best low-dimensional
approximation of the overall variance within each sample and were
computed in a way unaffected by group mean differences.
Being linear combinations, you can just project all of your original
data into the subpace they define and examine that for, say,
differences in means beyond that expected based on within group
sampling covariance. Yes, you have selected that subspace with the
greatest within group variance, but under the usual null model,
variation in means should follow the same pattern, and it is that null
model you are testing either by parametric statistics or randomization
or resampling tests.
-ds
cheers!
_______________________________________________________
Pedro Cordeiro Estrela
Dr.Sc.
Departamento de Genetica - Universidade Federal do Rio Grande do Sul
Campus do Vale - Bloco III
Av. Bento Gonçalves, 9500 - Agronomia
Porto Alegre, RS 91501-970 / Caixa Postal 15.053
Brasil.
TEL: +55 (51) 3308.6726
(cod. Porto Alegre)
|lIi___Lo¬___iIl|
________________________________________________________
--- On *Thu, 2/12/09, morphmet /<[email protected]>/*
wrote:
From: morphmet <[email protected]>
Subject: [Fwd: Re: Canonical variates from first PCs of GPA residuals]
To: "morphmet" <[email protected]>
Date: Thursday, February 12, 2009, 1:15 PM
-------- Original Message --------
Subject: Re: Canonical variates from first PCs of GPA residuals
Date: Thu, 12 Feb 2009 10:19:36 -0800 (PST)
From: Dennis E. Slice <[email protected]>
To: [email protected]
References: <[email protected]>
That would meet the minimum requirements, but you could still run
into trouble with ill conditioned covariance matrices. Ideally, you
would like many more observations than axes. That is, I think what
you describe might be satisfactory in the case large samples of
Procrustes coordinates where almost four dimensions are invariant for
2D data (almost seven for 3D).
Note, the problem is that overall PCA with grouped data can only be
used for dimension reduction for visualization - there is no
statistical model. You can do something, perhaps, with PCs from a
single-group PCA, or use within-group PCs for dimension reduction and
still examine between group differences.
Best, dslice
morphmet wrote:
-------- Original Message --------
Subject: Re: Canonical variates from first PCs of GPA residuals
Date: Wed, 11 Feb 2009 09:08:23 -0800 (PST)
From: <[email protected]>
To: [email protected]
References: <[email protected]>
With regard to using PCA to reduce dimensionality, it may be worth
noting that if one uses all PC axes (or RW axis) with non-zero
variance (ie non-zero eigenvalues), then there is no loss of variance
in the the data. You have simply rotated all the variance in the set
into a number of axes which matches the degrees of freedom in the
data set.
It would seem that this approach has the potential to avoid an
artificial reduction in sample variation.
What do you think? Is there something missing in the above arguement?
H. David Sheets, PhD
Dept of Physics, Canisius College
2001 Main St
Buffalo NY 14208
--- Original message ----
Date: Wed, 11 Feb 2009 11:32:34 -0500
From: morphmet <[email protected]> Subject:
Re: Canonical variates from first PCs of GPA residuals
To: morphmet <[email protected]>
-------- Original Message --------
Subject: Re: Canonical variates from first PCs of GPA residuals
Date: Wed, 11 Feb 2009 08:28:03 -0800 (PST)
From: Dennis E. Slice <[email protected]>
To: [email protected]
References: <[email protected]>
Relevant to the current posting...
"Is it possible to use rw as variables in multivariate analysis to
differentiate groups?"
Some time ago this question was posed and I answered a simple "Yes."
This is correct since relative warps are a rotation of the partial
warp scores (including the uniform component) and completely describe
the shapes of the sample. If you use all of the relative warps, you
should get the same discrimination as if you used the partial warp
scores.
Some background discussion, however, pointed out an important, but
perhaps subtle point (thanks, Fred). That is, you should NOT use a
reduced set of RWs for your analysis. While PCA (e.g., as used to
construct relwarps) makes no reference to group membership, it is
possible that group differences could be a major contributor to
sample variation. This is, after all, the basis for the one-tailed
F-test used in ANOVA - variance among means is tested to see if it is
greater than that expected based on within-sample variation. So, if
this were the case, and you subjected a reduced set of relative warps
to MANOVA, CVA, etc. the results could be misleading. If your only
goal is to classify an unknown, then it doesn't really matter (and
may help) that you have concentrated group differences in the
retained components, but in any statistical testing (even
nonparametric testing), p-values for significance tests of group mean
differences will likely be biased, i.e., too small.
What to do if you need data reduction? Use the initial PCs from the
pooled, within-group shape variation. Their computation is not
affected by group mean differences. Even here, though, it is
inappropriate to select the number of retained PCs based on
"noticing" interesting group separation on one or more of them.
The above holds for GPA coordinates just as it does for relwarps.
-dslice
morphmet wrote:
-------- Original Message
Subject: Canonical variates from first PCs of GPA residuals
Date: Tue, 10 Feb 2009 05:15:05 -0800 (PST)
From: Peter Taylor <[email protected]>
To: <[email protected]>
Dear Morphometricians
I am working with data where the number of landmarks (from rodent
skulls) exceeds the smallest sample sizes of my groups. To circumvent
statistical problems with null determinants when using canonical
analysis (CVA) of the weights matrix from GPA, is it permissable to
conduct CVA on the first few PCs from a PCA of the residuals, or
aligned coordinates after least squares, GPA? If so how does one
objectively decide how many PCs to include, should this number be
less than the smallest group sample size, or should it depend on a
certain threshold of cumulative explained variance (70%) or on the
eigenvalues (>1?), or on the degree of separation of groups? Also, is
this approach equivalent, or preferable, to conducting CVA on the
first few relative warps from a relative warps analysis (PCA of
weights matrix). I have seen both approaches in the literature but
not sure which is best.
Many thanks
Peter
Dr Peter John Taylor
Curator of Mammals
Durban Natural Science Museum
Ethekwini Libraries & Heritage
P O Box 4085
Durban
4000
---------------------------------------------------------------------------
>>> Physical address:
>>> First Floor, City Hall, Smith Street Entrance, 4001
>>> &
>>> Research Centre, 151 Old Fort Road (cnr Wyatt St)
>>>
—-------------------------------------------------------------------------
>>> Tel: + 27 31 3054162/4/5/7
>>> Cell: 083 7924810
>>> Fax: + 27 31 311 2242
>>> Email: [email protected]
<mailto:[email protected]>
>>> or (home): [email protected]
<mailto:[email protected]>
>>> or: [email protected] <mailto:[email protected]>
>>>
>>> Internet: www.durban.gov.za/naturalscience/
>>> <http://www.durban.gov.za/naturalscience/>
>>>
>> -- Dennis E. Slice
>> Associate Professor
>> Dept. of Scientific Computing
>> Florida State University
>> Dirac Science Library
>> Tallahassee, FL 32306-4120
>> -
>> Guest Professor
>> Department of Anthropology
>> University of Vienna
>> ========================================================
>>
>>
>>
>> -- Replies will be sent to the list.
>> For more information visit http://www.morphometrics.org
>>
>
-- Dennis E. Slice
Associate Professor
Dept. of Scientific Computing
Florida State University
Dirac Science Library
Tallahassee, FL 32306-4120
-
Guest Professor
Department of Anthropology
University of Vienna
========================================================
-- Replies will be sent to the list.
For more information visit http://www.morphometrics.org
--
Replies will be sent to the list.
For more information visit http://www.morphometrics.org