Re: [Fwd: Re: Canonical variates from first PCs of GPA residuals]

morphmet Fri, 13 Feb 2009 12:59:36 -0800


-------- Original Message --------
Subject: Re: [Fwd: Re: Canonical variates from first PCs of GPA residuals]
Date: Fri, 13 Feb 2009 12:50:30 -0800 (PST)
From: Dennis E. Slice <[email protected]>
To: [email protected]
References: <[email protected]>



morphmet wrote:



-------- Original Message --------

Subject: Re: [Fwd: Re: Canonical variates from first PCs of GPAresiduals]

Date:     Thu, 12 Feb 2009 12:11:37 -0800 (PST)
From:     Pedro Cordeiro Estrela <[email protected]>
Reply-To:     [email protected]
To:     [email protected]



Dear All and Dennis Slice,

what exactly could be "misleading"?


The misleading part is believing any p-values for tests of group
differences based on variables (PCs) that may/probably enhance group
differences. The differences, themselves, may contribute to variance and
attract the low PCs. Hence, you are in a statistical position akin to
looking at your groups and testing those that are *most* different using
models that assume you didn't select the groups because of their
impressive differences.


...snip...


Dennis, I did not understand your sentence: "Note, the problem is that
overall PCA with grouped data can only be used for dimension reduction
for visualization - there is no statistical model. You can do something,
perhaps, with PCs from a single-group PCA, or use within-group PCs for
dimension reduction and still examine between group differences."


Most any discussion of PCA should include an admonition along the lines
of "Their development does not require a multivariate normal assumption.
On the other hand, principal components derived for multivariate normal
populations have useful interpretations in terms of constant density
ellipsoids." (taken from Johnson and Wichern. 1982. Applied Multivariate
Statistical Analysis, p 362.)

If you are examining a single multivariate population, you can think of
PCA as revealing some parametric aspect of the correlation/covariance
structure. But, PCA does not require this. You can do it on data
consisting of samples from multiple groups with any distribution. As
soon as you do this, however, you have violated the most basic
assumption of a random sample (unless you can argue the species, say,
were constructed and sampled at random, which they ever are).

So, none of the traditional tests for group differences based on
anything constructed from PCs of nonrandom samples are appropriate. Even
nonparametric tests require random sampling.

But, PCA does give you the best (in the variance maximizing sense) lower
dimensional representation of your higher dimensional nonrandom sample.
That is fine for looking at through a glass less darkly.

PCA of pooled, within-group covariance is different. Here, each sample
is assumed to be random sample and, maybe, even a multivariate normal
one. A key assumption of what is to follow is that all groups, species,
etc. have a common covariance structure. If that is the case, then
subtracting off the mean is just a variable-wise additive coding and
does not affect the covariance structure. Pooling thus gives an even
better estimate of the covariance structure shared by all groups.

The first one or two or ten of these, then, are linear combinations of
the original variables that give the best low-dimensional approximation
of the overall variance within each sample and were computed in a way
unaffected by group mean differences.

Being linear combinations, you can just project all of your original
data into the subpace they define and examine that for, say, differences
in means beyond that expected based on within group sampling covariance.
Yes, you have selected that subspace with the greatest within group
variance, but under the usual null model, variation in means should
follow the same pattern, and it is that null model you are testing
either by parametric statistics or randomization or resampling tests.

-ds


cheers!




_______________________________________________________
Pedro Cordeiro Estrela
Dr.Sc.

Departamento de Genetica - Universidade Federal do Rio Grande do Sul
Campus do Vale - Bloco III
Av. Bento Gonçalves, 9500 - Agronomia
Porto Alegre, RS 91501-970 / Caixa Postal 15.053
Brasil.
TEL: +55 (51) 3308.6726
(cod. Porto Alegre)

|lIi___Lo¬___iIl|
________________________________________________________

--- On *Thu, 2/12/09, morphmet /<[email protected]>/*
wrote:

    From: morphmet <[email protected]>
    Subject: [Fwd: Re: Canonical variates from first PCs of GPA residuals]
    To: "morphmet" <[email protected]>
    Date: Thursday, February 12, 2009, 1:15 PM


    -------- Original Message --------
    Subject: Re: Canonical variates from first PCs of GPA residuals
    Date: Thu, 12 Feb 2009 10:19:36 -0800 (PST)
    From: Dennis E. Slice <[email protected]>
    To: [email protected]
    References: <[email protected]>

    That would meet the minimum requirements, but you could still run into
    trouble with ill conditioned covariance matrices. Ideally, you would
    like
     many more observations than axes. That is, I think what you
    describe might be satisfactory in the case large samples of Procrustes
    coordinates where almost four dimensions are invariant for 2D data
    (almost seven for 3D).

Note, the problem is that overall PCA with grouped data can only beused fordimension reduction for visualization - there is no statisticalmodel. You cando something, perhaps, with PCs from a single-group PCA, or usewithin-group PCs

    for dimension reduction and still examine between group differences.

    Best, dslice

    morphmet wrote:
    >
    >
    > -------- Original Message --------
    > Subject: Re: Canonical variates from first PCs of GPA residuals
    > Date: Wed, 11 Feb 2009 09:08:23 -0800 (PST)
    > From: <[email protected]>
    > To: [email protected]
    > References: <[email protected]>
    >
    >
    > With regard to
     using PCA to reduce dimensionality, it may be worth noting

that if one uses all PC axes (or RW axis) with non-zero variance (ienon-zeroeigenvalues), then there is no loss of variance in the the data. Youhavesimply rotated all the variance in the set into a number of axeswhich matches

    the degrees of freedom in the data set.
    >

> It would seem that this approach has the potential to avoid anartificial

    reduction in sample variation.
    >

> What do you think? Is there something missing in the abovearguement?

    >
    > H. David Sheets, PhD
    > Dept of Physics, Canisius College
    > 2001 Main St
    > Buffalo NY 14208
    >
    >
    > ---- Original message ----
    >> Date: Wed, 11 Feb 2009 11:32:34 -0500
    >> From: morphmet <[email protected]>  Subject:
    Re: Canonical variates from first PCs of GPA residuals  To:
     morphmet
    <[email protected]>
    >>
    >>
    >>
    >> -------- Original Message --------
    >> Subject: Re: Canonical variates from first PCs of GPA residuals
    >> Date: Wed, 11 Feb 2009 08:28:03 -0800 (PST)
    >> From: Dennis E. Slice <[email protected]>
    >> To: [email protected]
    >> References: <[email protected]>
    >>
    >> Relevant to the current posting...
    >>
    >> "Is it possible to use rw as variables in multivariate analysis
    to
    >> differentiate groups?"
    >>
    >> Some time ago this question was posed and I answered a simple
    "Yes."
    >> This is correct since relative warps are a rotation of the partial
    warp
    >> scores (including the uniform component) and completely describe the
    >> shapes of the sample. If you use all of the relative
     warps, you should
    >> get the same discrimination as if you used the partial warp scores.
    >>
    >> Some background discussion, however, pointed out an important, but
    >> perhaps subtle point (thanks, Fred). That is, you should NOT use a
    >> reduced set of RWs for your analysis. While PCA (e.g., as used to
    >> construct relwarps) makes no reference to group membership, it is

>> possible that group differences could be a major contributor tosample

    >> variation. This is, after all, the basis for the one-tailed F-test
    used

>> in ANOVA - variance among means is tested to see if it is greaterthan

    >> that expected based on within-sample variation. So, if this were the
    >> case, and you subjected a reduced set of relative warps to MANOVA,
    CVA,

>> etc. the results could be misleading. If your only goal is toclassify

    >> an unknown, then it
     doesn't really matter (and may help) that you
    have

>> concentrated group differences in the retained components, but inany

    >> statistical testing (even nonparametric testing), p-values for
    >> significance tests of group mean differences will likely be biased,
    >> i.e., too small.
    >>
    >> What to do if you need data reduction? Use the initial PCs from the
    >> pooled, within-group shape variation. Their computation is not
    affected
    >> by group mean differences. Even here, though, it is inappropriate to
    >> select the number of retained PCs based on "noticing"
    interesting group
    >> separation on one or more of them.
    >>
    >> The above holds for GPA coordinates just as it does for relwarps.
    >>
    >> -dslice
    >>
    >> morphmet wrote:
    >>>
    >>>
    >>> -------- Original Message
     --------
    >>> Subject:     Canonical variates from first PCs of GPA residuals
    >>> Date:     Tue, 10 Feb 2009 05:15:05 -0800 (PST)
    >>> From:     Peter Taylor <[email protected]>
    >>> To:     <[email protected]>
    >>>
    >>>
    >>>
    >>> Dear Morphometricians
    >>> I am working with data where the number of landmarks (from rodent
    >>> skulls) exceeds the smallest sample sizes of my groups. To
    circumvent
    >>> statistical problems with null determinants when using canonical
    >>> analysis (CVA) of the weights matrix from GPA, is it permissable
    to
    >>> conduct CVA on the first few PCs from a PCA of the residuals, or
    aligned
    >>> coordinates after least squares, GPA? If so how does one
    objectively
    >>> decide how many PCs to include, should this number be less
     than
    the
    >>> smallest group sample size, or should it depend on a certain
    threshold
    >>> of cumulative explained variance (70%) or on the eigenvalues
    (>1?), or
    >>> on the degree of separation of groups? Also, is this approach
    >>> equivalent, or preferable, to conducting CVA on the first few
    relative
    >>> warps from a relative warps analysis (PCA of weights matrix). I
    have
    >>> seen both approaches in the literature but not sure which is best.
    >>> Many thanks
    >>> Peter
    >>>
    >>>
    >>> Dr Peter John Taylor
    >>> Curator of Mammals
    >>> Durban Natural Science Museum
    >>> Ethekwini Libraries & Heritage
    >>> P O Box 4085
    >>> Durban
    >>> 4000
    >>>

---------------------------------------------------------------------------

    >>> Physical address:
    >>> First Floor, City Hall, Smith Street Entrance, 4001
    >>> &
    >>> Research Centre, 151 Old Fort Road (cnr Wyatt St)
    >>>

—-------------------------------------------------------------------------
    >>> Tel:  + 27 31 3054162/4/5/7
    >>> Cell: 083 7924810
    >>> Fax:  + 27 31 311 2242
    >>> Email: [email protected]
    <mailto:[email protected]>
    >>> or (home): [email protected]
    <mailto:[email protected]>
    >>> or: [email protected] <mailto:[email protected]>
    >>>
    >>> Internet: www.durban.gov.za/naturalscience/
    >>> <http://www.durban.gov.za/naturalscience/>
    >>>
    >>>
    >>>
    >>
    >> -- Dennis E. Slice
    >> Associate Professor
    >> Dept. of Scientific
     Computing
    >> Florida State University
    >> Dirac Science Library
    >> Tallahassee, FL 32306-4120
    >>     -
    >> Guest Professor
    >> Department of Anthropology
    >> University of Vienna
    >> ========================================================
    >>
    >>
    >>
    >> -- Replies will be sent to the list.
    >> For more information visit http://www.morphometrics.org
    >>
    >

    -- Dennis E. Slice
    Associate Professor
    Dept. of Scientific Computing
    Florida State University
    Dirac Science Library
    Tallahassee, FL 32306-4120
        -
    Guest Professor
    Department of Anthropology
    University of Vienna
    ========================================================



    -- Replies will be sent to the list.
    For more information visit http://www.morphometrics.org


--
Dennis E. Slice
Associate Professor
Dept. of Scientific Computing
Florida State University
Dirac Science Library
Tallahassee, FL 32306-4120
        -
Guest Professor
Department of Anthropology
University of Vienna
========================================================



--
Replies will be sent to the list.
For more information visit http://www.morphometrics.org

Re: [Fwd: Re: Canonical variates from first PCs of GPA residuals]

Reply via email to