Re: CVA limitations?

morphmet Fri, 03 Apr 2009 06:35:07 -0700

-------- Original Message --------
Subject: Re: CVA limitations?
Date: Thu, 2 Apr 2009 20:55:02 -0700 (PDT)
From: Philipp Mitteröcker <[email protected]>
To: [email protected]
References: <[email protected]>

Yes, if the CVA for k groups is based on the pooled covariance matrix,
total sample size must at least be as large as the number of variables
+ k.

It seems to me that if a group has rather small N the computation of
CVA is still possible, as long as total sample size is large enough.
But the error in estimating mean and covariance matrix of this group
may be considerable and would affect the pooled estimate. Maybe it is
better to exclude very small groups from the estimation of the pooled
covariance matrix.

Yes, even if CVA gives a good separation of groups with identical
means because of too many variables, cross-validation would lead to
more or less random group affiliations.

But in order to use CVA as a reliable ordination technique, number of
cases should be many more times the number of variables. I would hence
use dimension reduction before CVA, e.g., use only the first few PCs
as input for the CVA. But for most ordination purposes, PCA may be
sufficient anyway.

I did not mean to do a comparison of CVA and PCA, but rather only a
PCA, which has none of these restrictions and probably gives the same
result for data consisting of so many groups.

If one wants to "identify populations with unique head morphologies",
the question seems to be about between-group shape differences,
without any reference to the within-group shape distribution. Hence no
CVA is necessary. I would suggest PCA for a first overview of the
data. If you see that fish with a common lifestyle cluster in the PC
scores, you may compare differences among averages of, say, the
benthic and the limnetic fish. If your information about the fish
populations is more detailed, you may instead use a partial least
squares analysis.

Philipp

Am 02.04.2009 um 18:59 schrieb morphmet:



-------- Original Message --------
Subject:        Re: CVA limitations?
Date:   Thu, 2 Apr 2009 09:56:01 -0700 (PDT)
From:   J. Willacker <[email protected]>
To:     [email protected]
References:     <[email protected]>



Thanks everyone for your replies.

My landmark suite includes 20 points, therefore I have 40variables. Ihave been doing a minimum of 40 specimens from each population, butif I

understand correctly I should consider doing more.  How do I know what
my goal within population N should be?  I have more than enough fish
from each population (for most I have 500+ fish) but with this many

populations time becomes an issue when my Ns get too high. Acollege is

using sample sizes as low as 20 for similar analyses (with the same 20
landmarks), but that doesn't seem valid.

Really, I am very new to these types of analyses and have some trouble
understanding how they do what they do.  I realize that no matter how
many fish I include, the CVA could not possibly separate ALL

populations. Ultimately, my goal is to identify populations withunique

head morphologies (very "benthic" or very "limnetic") for use in my
studies of trophic morphology/ecology.  Given my purpose, is there a
different analysis that would be more appropriate?

On Thu, Apr 2, 2009 at 4:27 AM, morphmet
<[email protected]
<mailto:[email protected]>> wrote:



   -------- Original Message --------
   Subject: Re: CVA limitations?
   Date: Thu, 2 Apr 2009 05:25:21 -0700 (PDT)
   From: andrea cardini <[email protected]
   <mailto:[email protected]>>
   To: [email protected] <mailto:[email protected]>

   Just a quick comment about Philipp's points.

The rule of thumb suggested by textbooks is more restrictive thanJim's

   "minimum sample size" as requires N of the smallest group to be
   larger than the number of variables. This seems to imply that the
   minimum requirement mentioned by Jim is met.
   Also, it's true that "CVA will always separate groups even if the
   share the same mean configuration" but in that case the
   cross-validation will likely produce hit-ratios which are no better
   than chance (that was about my last point in the previous message).

Unfortunately most of the time people doing taxonomy like myselfare

   in the situation exemplified by Paul's case (below) where N is very
   unequal across groups and there are at least a few groups with N
   much smaller than the number of variables. Then, one may or may not
   be computationally able to do a DA/CVA but assumptions are unlikely
   to be met, hard to verify and (even more concerning) sampling error
   may lead to inaccurate estimates of means, variances etc.
   Resampling statistics may help but won't do anything about the
   accuracy of estimates and one can only acknowledge that results (at
   least those

concerning smallest samples) will have to be verified on largersamples.

I'd like also to remember that besides sample size, one shouldcarefully

   consider provenance of specimens and maybe also time of specimen

collection. A small sample of individuals collected at the sametime

   and in the same locality could make things even worse if one is
   interested in estimating means and their variation in the whole
   population. Again, this is not uncommon for rare species/subspecies
   from museum collections.

   Cheers

   Andrea




   At 07:43 02/04/2009 -0400, you wrote:



       -------- Original Message --------
       Subject: Re: CVA limitations?
       Date: Thu, 2 Apr 2009 04:26:50 -0700 (PDT)
       From: Paul Van Daele <[email protected]
       <mailto:[email protected]>>

To: <[email protected] <mailto:[email protected]>>

       References: <[email protected]
       <mailto:[email protected]>>

what if total sample size is larger than the number ofvariables

       but some
       groups have a lower sample size than the number of variables?

       Say eg you have 26 variables and three groups with resp. 40, 15
       and 5
       specimens


       Paul Van Daele
       Ghent University
       Evolutionary Morphology of Vertebrates
       KL Ledeganckstraat 35
       B-9000 Gent
       Belgium
       [email protected] <mailto:[email protected]>
       Tel +32 92645233
       Fax +32 92645344

       Do not go gentle into that good night (D. Thomas)
       ----- Original Message -----
       From: "morphmet" <[email protected]
       <mailto:[email protected]>>
       To: "morphmet" <[email protected]
       <mailto:[email protected]>>
       Sent: Thursday, April 02, 2009 1:08 PM
       Subject: Re: CVA limitations?




           -------- Original Message --------
           Subject: Re: CVA limitations?
           Date: Wed, 1 Apr 2009 16:04:19 -0700 (PDT)
           From: Philipp Mitteröcker <[email protected]
           <mailto:[email protected]>>
           To: [email protected]
           <mailto:[email protected]>
           References: <[email protected]
           <mailto:[email protected]>>

Actually, the "rule of thumb" is a computationalnecessity. Morecorrect is Jim's formulation that the "degrees of freedomof thewithin-group covariance matrix to be greater than thenumber of

           variables". Otherwise you cannot invert the covariance
           matrix and

hence cannot compute the CVA. But sample size should bemuch

           larger

than the number of variables in order to produceinterpretable

           results. If the sample size is close to the number of
           variables, CVA
           will always separate groups even if the share the same mean
           configuration.

But for 65 populations no low-dimensional representationwill be

           sufficient to distinguish between ALL groups. Furthermore,
           CVA assumes
           equal covariance matrices for all groups, which seems
           unlikely for so
           many populations. If the covariance structures vary
           considerably, a

pooled estimate may be close to a spherical distributionand theresulting CVA would be very similar to a principalcomponent

           analysis
           (PCA). I would thus suggest to proceed with a PCA, also
           because there
           are no restriction on sample size and statistical artifacts
           are less
           likely.

           I hope this helps,

           Philipp




           Am 01.04.2009 um 19:33 schrieb morphmet:



               -------- Original Message --------
               Subject: Re: CVA limitations?
               Date: Wed, 1 Apr 2009 09:15:46 -0700 (PDT)
               From: andrea cardini <[email protected]
               <mailto:[email protected]>>
               To: [email protected]
               <mailto:[email protected]>

               Dear James,
               on a similar issue there was an exchange of emails in
               MORPHMET some  time

ago (February, I think) and a few more emails whichwere

               not sent to  the
               list. Jim Rohlf suggested to summarize the main points
               in an email to
               MORPHMET and I agree with him that it's a very good
               idea.  Unfortunately I am too busy right now for this
               but hope to do it  soon or later.

               Just a couple of quick comments (which greatly
               oversimplify the problem).

First of all, give a look at assumptions of DA/CVA.With

               many groups  and
               small samples they're often difficult to test.
               Second point, from a message that Jim Rohlf sent a
               couple of years  ago:
               "... in order use methods that look at difference among
               groups  relative to
               within-group variability one needs the degrees of
               freedom of the
               within-group covariance matrix to be greater than the
               number of variables.
               With fewer observations the within-group covariance
               matrix will be
               singular. This rule gives a minimum sample size but for
               reliable  results
               the sample size should, of course, be much larger". To
               have more reliable
               results, there's a rule of thumb which is suggested in
               many  textbooks (and
               I am not sure if it is actually supported by studies):
               this is that within

each group you should have more specimens thanvariables.

               Last comment, if you really want to do a DA/CVA when N
               is not very large,
               I'd carefully check if results are stable when you
               exclude small  groups and

I'd always cross-validate all analyses. If you findthat

               despite
               significance, cross-validated hit ratios (i.e.,
               percentages of  specimens
               correctly classified according to groups) are low, I'd
               be very  cautious
               about what those differences really mean (if they do
               mean anything  at all).

               There's plenty of references on this stuff. An old one
               which I  greatly like
               is Neff & Marcus' chapter on DA/CVA in their book on
               "Multivariate Methods
               for Systematics" (1980).

               Good luck with your research.
               Cheers

               Andrea

               At 09:01 01/04/2009 -0400, you wrote:



                   -------- Original Message --------
                   Subject: CVA limitations?
                   Date: Tue, 31 Mar 2009 18:20:40 -0700 (PDT)
                   From: J. Willacker <[email protected]
                   <mailto:[email protected]>>
                   To: Morphmet <[email protected]
                   <mailto:[email protected]>>



                   Hi,

                   I was wondering if there were any limits to the
                   number of groups that
                   can be distinguished between with CVA?  I'm
                   comparing facial  morphology
                   in 65 populations of threespine stickleback fish,
                   but don't know if  CVA
                   is valid with so many groups.  Is there a relation
                   between number of
                   specimens per group and how many groups can be
                   compared?  At some  point
                   does the power of the analysis suffer?  Really need
                   help with this since
                   nobody in our stats department seems to know the
                   answer.  Feel free  to
                   respond to [email protected]
                   <mailto:[email protected]>
                   <mailto:[email protected]
                   <mailto:[email protected]>>  Thanks, James

                   --
                   Replies will be sent to the list.
                   For more information visit
                   http://www.morphometrics.org
                   <http://www.morphometrics.org/>







               --
               Replies will be sent to the list.
               For more information visit http://www.morphometrics.org
               <http://www.morphometrics.org/>




           ____________________________________

           Dr. Philipp Mitteröcker

           Department of Theoretical Biology
           University of Vienna
           Althanstrasse 14
           A-1090 Vienna, Austria

           Tel: +43 1 4277 56705
           Fax: +43 1 4277 9544
           [email protected]
           <mailto:[email protected]>
           www.virtual-anthropology.com/Members/philippm
           <http://www.virtual-anthropology.com/Members/philippm>












           --
           Replies will be sent to the list.
           For more information visit http://www.morphometrics.org
           <http://www.morphometrics.org/>





       --
       Replies will be sent to the list.
       For more information visit http://www.morphometrics.org
       <http://www.morphometrics.org/>




   Dr. Andrea Cardini

   Lecturer in Animal Biology
   Museo di Paleobiologia e dell'Orto Botanico, Universitá di Modena e
   Reggio
   Emilia
   via Università 4, 41100, Modena, Italy
   tel: 0039 059 2056532; fax: 0039 059 2056535

   Honorary Fellow
   Functional Morphology and Evolution Unit, Hull York Medical School
   University of Hull, Cottingham Road, Hull, HU6 7RX, UK
   University of York, Heslington, York YO10 5DD, UK

   E-mail address: [email protected]
   <mailto:[email protected]>, [email protected]
   <mailto:[email protected]>,
   [email protected] <mailto:[email protected]>
   http://hyms.fme.googlepages.com/drandreacardini
http://ads.ahds.ac.uk/catalogue/archive/cerco_lt_2007/overview.cfm#metadata

   More on publications at:
   http://www.cons-dev.org/marm/MARM/EMARM/framarm/framarm.html

CLICK ON THE LETTER C AND LOOK FOR "CARDINI" (p. 8-9 until March2009)

   http://hyms.fme.googlepages.com/dr.sarahelton-publications
   LOOK FOR "CARDINI"











   --
   Replies will be sent to the list.
   For more information visit http://www.morphometrics.org
   <http://www.morphometrics.org/>



--
Replies will be sent to the list.
For more information visit http://www.morphometrics.org



____________________________________

Dr. Philipp Mitteröcker

Department of Theoretical Biology
University of Vienna
Althanstrasse 14
A-1090 Vienna, Austria

Tel: +43 1 4277 56705
Fax: +43 1 4277 9544
[email protected]
www.virtual-anthropology.com/Members/philippm












--
Replies will be sent to the list.
For more information visit http://www.morphometrics.org

Re: CVA limitations?

Reply via email to