Canonical variates from first PCs of GPA residuals

morphmet Wed, 18 Feb 2009 08:08:52 -0800

-------- Original Message --------
Subject: Canonical variates from first PCs of GPA residuals
Date: Tue, 17 Feb 2009 05:16:25 -0800 (PST)
From: Julio Rojo <[email protected]>
To: [email protected]
References: <[email protected]>

Dear morphometricians:

I am a little bit confused... and I would like to understand better this
topic.

In my case, I have measured few skulls from different places of a single
species, and I want to know if there is morphological differences and/or
morphological groups related to its geographic origin. Long time ago,
some subspecies have been described, but from one or two skulls by
subspecies or from some skins... I thought that first principal
components could help me to see if main sample variation could be
related to geographical origin (or any other variable) of the specimens
or not, or to reduces the number of variables to be included after in a
statistical test.

What should I do?... find group differences wherever it will be
(full-sample CVA with old subspecies assignments or creating new
biogeographic variables)? or take a look if the main variation of my
sample is related to geographical origin, for example, plotting first
PCAs with biogeographic-origin labeled scores? The problem is that I
have few skulls of some "traditional" groups, like the case that begun
this topic.

I understand PCA (i.e. Relative Warps) as a rotation of variables
(Partial Warps with Uniform comp) that creates new variables where
sample variance is maximized from first (that contains highest amount of
sample variance) to the last (lowest amount) component. Then, if I
select only first principal components -that shows highest variances-
from a PCA, I could lose information about variation that could be -or
not- important to discriminate groups, but these "lost" variance are not
the main variation of my sample.

For example, it could happen that there is a small variation in one
morphological trait highly related to geographical origin, but this
variation could not be highly reflected in first PCs... because PCA is
not a CVA, and other variables could show higher variances, this trait
that discriminates groups couldn't plays an important role in total
variance, and therefore wouldn't affect first principal components.

I understand that, about select first PCs from a full-sample PCA, Dennis
Slice argues that there is a nonrandom sample if I know that there is
more than one group in these sample, but I don't know really how many
groups are included in my sample and if the old subspecies assignments
are the "real" group assignments. If I choose to do the pooled
Within-group PCA method for dimensions reduction, could I lose
information if group assignments are not true, same as I told above
about "all-sample" PCA analysis? I would like to know more about pooled
Within-group PCA, how it "works" and how to do it (can I do it, for
example,in PAST software?).

Other big problem appears if I want to test group differences with any
statistic test, because I couldn't use the same sample where I have
found, selected, choose... groups, and therefore there is no "a priori"
group definitions (for this data), a basic statement to any statistical
analysis to test group differences.

Sorry about my english and thank you, I have learned a lot with your
comments!!
Cheers

Julio Rojo
Doctorate Student in Conservation Biology
Universidad Complutense de Madrid
Spain
[email protected]

morphmet escribió:

-------- Original Message --------
Subject: Re: [Fwd: Re: Canonical variates from first PCs of GPAresiduals]
Date: Fri, 13 Feb 2009 12:50:30 -0800 (PST)
From: Dennis E. Slice <[email protected]>
To: [email protected]
References: <[email protected]>


morphmet wrote:
-------- Original Message --------
Subject: Re: [Fwd: Re: Canonical variates from first PCs of GPAresiduals]
Date:     Thu, 12 Feb 2009 12:11:37 -0800 (PST)
From:     Pedro Cordeiro Estrela <[email protected]>
Reply-To:     [email protected]
To:     [email protected]

Dear All and Dennis Slice,

what exactly could be "misleading"?
The misleading part is believing any p-values for tests of groupdifferences based on variables (PCs) that may/probably enhance groupdifferences. The differences, themselves, may contribute to varianceand attract the low PCs. Hence, you are in a statistical position akinto looking at your groups and testing those that are *most* differentusing models that assume you didn't select the groups because of theirimpressive differences.
...snip...
Dennis, I did not understand your sentence: "Note, the problem isthat overall PCA with grouped data can only be used for dimensionreduction for visualization - there is no statistical model. You cando something, perhaps, with PCs from a single-group PCA, or usewithin-group PCs for dimension reduction and still examine betweengroup differences."
Most any discussion of PCA should include an admonition along thelines of "Their development does not require a multivariate normalassumption. On the other hand, principal components derived formultivariate normal populations have useful interpretations in termsof constant density ellipsoids." (taken from Johnson and Wichern.1982. Applied Multivariate Statistical Analysis, p 362.)
If you are examining a single multivariate population, you can thinkof PCA as revealing some parametric aspect of thecorrelation/covariance structure. But, PCA does not require this. Youcan do it on data consisting of samples from multiple groups with anydistribution. As soon as you do this, however, you have violated themost basic assumption of a random sample (unless you can argue thespecies, say, were constructed and sampled at random, which they everare).
So, none of the traditional tests for group differences based onanything constructed from PCs of nonrandom samples are appropriate.Even nonparametric tests require random sampling.
But, PCA does give you the best (in the variance maximizing sense)lower dimensional representation of your higher dimensional nonrandomsample.
That is fine for looking at through a glass less darkly.
PCA of pooled, within-group covariance is different. Here, each sampleis assumed to be random sample and, maybe, even a multivariate normalone. A key assumption of what is to follow is that all groups,species, etc. have a common covariance structure. If that is the case,then subtracting off the mean is just a variable-wise additive codingand does not affect the covariance structure. Pooling thus gives aneven better estimate of the covariance structure shared by all groups.
The first one or two or ten of these, then, are linear combinations ofthe original variables that give the best low-dimensionalapproximation of the overall variance within each sample and werecomputed in a way unaffected by group mean differences.
Being linear combinations, you can just project all of your originaldata into the subpace they define and examine that for, say,differences in means beyond that expected based on within groupsampling covariance. Yes, you have selected that subspace with thegreatest within group variance, but under the usual null model,variation in means should follow the same pattern, and it is that nullmodel you are testing either by parametric statistics or randomizationor resampling tests.
-ds
cheers!

_______________________________________________________
Pedro Cordeiro Estrela
Dr.Sc.

Departamento de Genetica - Universidade Federal do Rio Grande do Sul
Campus do Vale - Bloco III
Av. Bento Gonçalves, 9500 - Agronomia
Porto Alegre, RS 91501-970 / Caixa Postal 15.053
Brasil.
TEL: +55 (51) 3308.6726
(cod. Porto Alegre)

|lIi___Lo¬___iIl|
________________________________________________________

--- On *Thu, 2/12/09, morphmet /<[email protected]>/*
wrote:

From: morphmet <[email protected]>
Subject: [Fwd: Re: Canonical variates from first PCs of GPA residuals]
To: "morphmet" <[email protected]>
Date: Thursday, February 12, 2009, 1:15 PM


-------- Original Message --------
Subject: Re: Canonical variates from first PCs of GPA residuals
Date: Thu, 12 Feb 2009 10:19:36 -0800 (PST)
From: Dennis E. Slice <[email protected]>
To: [email protected]
References: <[email protected]>
That would meet the minimum requirements, but you could still runinto trouble with ill conditioned covariance matrices. Ideally, youwould like many more observations than axes. That is, I think whatyou describe might be satisfactory in the case large samples ofProcrustes coordinates where almost four dimensions are invariant for2D data (almost seven for 3D).
Note, the problem is that overall PCA with grouped data can only beused for dimension reduction for visualization - there is nostatistical model. You can do something, perhaps, with PCs from asingle-group PCA, or use within-group PCs for dimension reduction andstill examine between group differences.
Best, dslice

morphmet wrote:
-------- Original Message --------
Subject: Re: Canonical variates from first PCs of GPA residuals
Date: Wed, 11 Feb 2009 09:08:23 -0800 (PST)
From: <[email protected]>
To: [email protected]
References: <[email protected]>
With regard to using PCA to reduce dimensionality, it may be worthnoting that if one uses all PC axes (or RW axis) with non-zerovariance (ie non-zero eigenvalues), then there is no loss of variancein the the data. You have simply rotated all the variance in the setinto a number of axes which matches the degrees of freedom in thedata set.
It would seem that this approach has the potential to avoid anartificial reduction in sample variation.
What do you think?  Is there something missing in the above arguement?

H. David Sheets, PhD
Dept of Physics, Canisius College
2001 Main St
Buffalo NY 14208


--- Original message ----
Date: Wed, 11 Feb 2009 11:32:34 -0500
From: morphmet <[email protected]>  Subject:
Re: Canonical variates from first PCs of GPA residuals
To: morphmet  <[email protected]>



-------- Original Message --------
Subject: Re: Canonical variates from first PCs of GPA residuals
Date: Wed, 11 Feb 2009 08:28:03 -0800 (PST)
From: Dennis E. Slice <[email protected]>
To: [email protected]
References: <[email protected]>

Relevant to the current posting...
"Is it possible to use rw as variables in multivariate analysis todifferentiate groups?"
Some time ago this question was posed and I answered a simple "Yes."
This is correct since relative warps are a rotation of the partialwarp scores (including the uniform component) and completely describethe shapes of the sample. If you use all of the relative warps, youshould get the same discrimination as if you used the partial warpscores.
Some background discussion, however, pointed out an important, butperhaps subtle point (thanks, Fred). That is, you should NOT use areduced set of RWs for your analysis. While PCA (e.g., as used toconstruct relwarps) makes no reference to group membership, it ispossible that group differences could be a major contributor tosample variation. This is, after all, the basis for the one-tailedF-test used in ANOVA - variance among means is tested to see if it isgreater than that expected based on within-sample variation. So, ifthis were the case, and you subjected a reduced set of relative warpsto MANOVA, CVA, etc. the results could be misleading. If your onlygoal is to classify an unknown, then it doesn't really matter (andmay help) that you have concentrated group differences in theretained components, but in any statistical testing (evennonparametric testing), p-values for significance tests of group meandifferences will likely be biased, i.e., too small.
What to do if you need data reduction? Use the initial PCs from thepooled, within-group shape variation. Their computation is notaffected by group mean differences. Even here, though, it isinappropriate to select the number of retained PCs based on"noticing" interesting group separation on one or more of them.
The above holds for GPA coordinates just as it does for relwarps.

-dslice

morphmet wrote:

 -------- Original Message
Subject:     Canonical variates from first PCs of GPA residuals
Date:     Tue, 10 Feb 2009 05:15:05 -0800 (PST)
From:     Peter Taylor <[email protected]>
To:     <[email protected]>

Dear Morphometricians
I am working with data where the number of landmarks (from rodentskulls) exceeds the smallest sample sizes of my groups. To circumventstatistical problems with null determinants when using canonicalanalysis (CVA) of the weights matrix from GPA, is it permissable toconduct CVA on the first few PCs from a PCA of the residuals, oraligned coordinates after least squares, GPA? If so how does oneobjectively decide how many PCs to include, should this number beless than the smallest group sample size, or should it depend on acertain threshold of cumulative explained variance (70%) or on theeigenvalues (>1?), or on the degree of separation of groups? Also, isthis approach equivalent, or preferable, to conducting CVA on thefirst few relative warps from a relative warps analysis (PCA ofweights matrix). I have seen both approaches in the literature butnot sure which is best.
Many thanks
Peter
Dr Peter John Taylor
Curator of Mammals
Durban Natural Science Museum
Ethekwini Libraries & Heritage
P O Box 4085
Durban
4000---------------------------------------------------------------------------
>>> Physical address:
>>> First Floor, City Hall, Smith Street Entrance, 4001
>>> &
>>> Research Centre, 151 Old Fort Road (cnr Wyatt St)
>>>
—-------------------------------------------------------------------------
>>> Tel:  + 27 31 3054162/4/5/7
>>> Cell: 083 7924810
>>> Fax:  + 27 31 311 2242
>>> Email: [email protected]
<mailto:[email protected]>
>>> or (home): [email protected]
<mailto:[email protected]>
>>> or: [email protected] <mailto:[email protected]>
>>>
>>> Internet: www.durban.gov.za/naturalscience/
>>> <http://www.durban.gov.za/naturalscience/>
>>>
>> -- Dennis E. Slice
>> Associate Professor
>> Dept. of Scientific Computing
>> Florida State University
>> Dirac Science Library
>> Tallahassee, FL 32306-4120
>>     -
>> Guest Professor
>> Department of Anthropology
>> University of Vienna
>> ========================================================
>>
>>
>>
>> -- Replies will be sent to the list.
>> For more information visit http://www.morphometrics.org
>>
>

-- Dennis E. Slice
Associate Professor
Dept. of Scientific Computing
Florida State University
Dirac Science Library
Tallahassee, FL 32306-4120
    -
Guest Professor
Department of Anthropology
University of Vienna
========================================================

-- Replies will be sent to the list.
For more information visit http://www.morphometrics.org




--
Replies will be sent to the list.
For more information visit http://www.morphometrics.org

Canonical variates from first PCs of GPA residuals

Reply via email to