The way I like to think of the generalized variance is that it is
equal to the product of the variances times the determinant of the
correlation matrix. If the correlations were all exactly zero then the
generalized variance would be just the product of the variances.  This
means that the units of the generalized variance are the products of
the squares of all the separate units. For this reason generalized
variances tend to be either very large or very small. I prefer to take
the p-th root of the generalized variance or to divide the log of the
generalized variance by p, where p is the number of variables. That
often puts the numbers in a more reasonable range unless the
determinant of the correlation matrix is very small due to the
presence of some high correlations.

The generalized variance is based on the following idea. Given, for
example, two bivariate samples with identical variances, the one with
a higher correlation would have less scatter about its major axis and
thus shows less variation. The same idea holds for higher dimensions.
However, if some correlation is equal to 1.0 then there is a PC axis
with no variation (an eigenvalue equal to zero) and hence the
determinant goes to zero because it is equal to the product of the
eigenvalues. That gives the perhaps paradoxical result of a measure of
variation equal to zero even though one sees lots of variation in the
sample. No real cutoff point for how much correlation is "ok".
However, the more variables you include in a study the more likely it
is that the last eigenvalue will be close to zero. Obvious
alternatives would be to use just the first eigenvalue (but that
leaves out lots of information about variation) or to use the sum of
the eigenvalues (but that totally ignores the correlations).

When I looked at this problem back in the 1970s, I thought a
reasonable generalization of the coefficient of variation would be the
geometric mean of the individual coefficients of variation times the
2*p-th root of the determinant of the correlation matrix (raising it
to the 1/2p power puts it on the same scale as the geometric mean of
the coefficients of variation). I did not publish it but I still think
it would be interesting to investigate.

I hope you find these comments helpful. More work is clearly needed.

=========================
F. James Rohlf
Distinguished Professor, Stony Brook University
http://life.bio.sunysb.edu/ee/rohlf


> -----Original Message-----
> From: morphmet [mailto:[EMAIL PROTECTED]
> Sent: Thursday, October 18, 2007 12:45 PM
> To: [email protected]
> Subject: Re: Measure of variability
>
>
> James, Is the degree of the problem going to be proportional to the
> degree
> of correlation?  Can he ignore a small amount of correlation?
>
> Would we agree to avoid anything using a CV and all ANOVA
> procedures? There
> are lots of examples out there where these types of things have been
> done.
> Be wary, Andrew! Again, I suggest manufacturing a data set with
> known
> distributions in order to check that your statistic represents what
> you want
> to study. Correlation and non-normality are hard to avoid in plant
> measures.
>
> Soule (1971), who I referenced before, used a statistic much like
> Anderson's
> generalized variance, with some additional transformations.
>
> I was interested in population variability, but ultimately I became
> interested in within-plant variability as a way to weight
> characters. That
> is where plants, with their modular form, provide a way for us to
> recognize
> some kinds of environmental variation.
>
> Yrs,
> Patricia
>
> On 10/16/07, morphmet <[EMAIL PROTECTED]> wrote:
> >
> > The classical measure for this is the generalized variance which
> is
> > the determinant of the covariance matrix. This would probably work
> > well unless some of the variables were highly correlated. The
> problem
> > then is that a perfect correlation results in a generalized
> variance
> > equal to zero even there is lots of variability in each variable
> > studied.
> >
> > There are also some ad hoc measures that could be tried such as
> the
> > average or geometric mean of the variances.
> >
> > The variables should also be in the same units - perhaps
> > log-transformed morphological measurements.
> >
> > =========================
> > F. James Rohlf
> > Distinguished Professor, Stony Brook University
> > http://life.bio.sunysb.edu/ee/rohlf
> >
> >
> > > -----Original Message-----
> > > From: morphmet [mailto:[EMAIL PROTECTED]
> > > Sent: Monday, October 15, 2007 3:46 PM
> > > To: morphmet
> > > Subject: Measure of variability
> > >
> > > Dear Morphometricians: I am a taxonomist working on a revision
> of a
> > > genus with about 80 species of plants (palms). I have a data
> matrix
> > > with
> > > measures of about 20 variables, taken from herbarium specimens.
> Some
> > > species are obviously much more variable than others. What I
> want is
> > > a
> > > single measure of variability of each species. What is this?
> > >
> > > Thanks. Andrew Henderson
> > >
> > >
> > > --
> > > Replies will be sent to the list.
> > > For more information visit http://www.morphometrics.org
> >
> >
> >
> >
> > --
> > Replies will be sent to the list.
> > For more information visit http://www.morphometrics.org
> >
> >
>
>
> --
> Replies will be sent to the list.
> For more information visit http://www.morphometrics.org




-- 
Replies will be sent to the list.
For more information visit http://www.morphometrics.org

Reply via email to