The way I like to think of the generalized variance is that it is equal to the product of the variances times the determinant of the correlation matrix. If the correlations were all exactly zero then the generalized variance would be just the product of the variances. This means that the units of the generalized variance are the products of the squares of all the separate units. For this reason generalized variances tend to be either very large or very small. I prefer to take the p-th root of the generalized variance or to divide the log of the generalized variance by p, where p is the number of variables. That often puts the numbers in a more reasonable range unless the determinant of the correlation matrix is very small due to the presence of some high correlations.
The generalized variance is based on the following idea. Given, for example, two bivariate samples with identical variances, the one with a higher correlation would have less scatter about its major axis and thus shows less variation. The same idea holds for higher dimensions. However, if some correlation is equal to 1.0 then there is a PC axis with no variation (an eigenvalue equal to zero) and hence the determinant goes to zero because it is equal to the product of the eigenvalues. That gives the perhaps paradoxical result of a measure of variation equal to zero even though one sees lots of variation in the sample. No real cutoff point for how much correlation is "ok". However, the more variables you include in a study the more likely it is that the last eigenvalue will be close to zero. Obvious alternatives would be to use just the first eigenvalue (but that leaves out lots of information about variation) or to use the sum of the eigenvalues (but that totally ignores the correlations). When I looked at this problem back in the 1970s, I thought a reasonable generalization of the coefficient of variation would be the geometric mean of the individual coefficients of variation times the 2*p-th root of the determinant of the correlation matrix (raising it to the 1/2p power puts it on the same scale as the geometric mean of the coefficients of variation). I did not publish it but I still think it would be interesting to investigate. I hope you find these comments helpful. More work is clearly needed. ========================= F. James Rohlf Distinguished Professor, Stony Brook University http://life.bio.sunysb.edu/ee/rohlf > -----Original Message----- > From: morphmet [mailto:[EMAIL PROTECTED] > Sent: Thursday, October 18, 2007 12:45 PM > To: [email protected] > Subject: Re: Measure of variability > > > James, Is the degree of the problem going to be proportional to the > degree > of correlation? Can he ignore a small amount of correlation? > > Would we agree to avoid anything using a CV and all ANOVA > procedures? There > are lots of examples out there where these types of things have been > done. > Be wary, Andrew! Again, I suggest manufacturing a data set with > known > distributions in order to check that your statistic represents what > you want > to study. Correlation and non-normality are hard to avoid in plant > measures. > > Soule (1971), who I referenced before, used a statistic much like > Anderson's > generalized variance, with some additional transformations. > > I was interested in population variability, but ultimately I became > interested in within-plant variability as a way to weight > characters. That > is where plants, with their modular form, provide a way for us to > recognize > some kinds of environmental variation. > > Yrs, > Patricia > > On 10/16/07, morphmet <[EMAIL PROTECTED]> wrote: > > > > The classical measure for this is the generalized variance which > is > > the determinant of the covariance matrix. This would probably work > > well unless some of the variables were highly correlated. The > problem > > then is that a perfect correlation results in a generalized > variance > > equal to zero even there is lots of variability in each variable > > studied. > > > > There are also some ad hoc measures that could be tried such as > the > > average or geometric mean of the variances. > > > > The variables should also be in the same units - perhaps > > log-transformed morphological measurements. > > > > ========================= > > F. James Rohlf > > Distinguished Professor, Stony Brook University > > http://life.bio.sunysb.edu/ee/rohlf > > > > > > > -----Original Message----- > > > From: morphmet [mailto:[EMAIL PROTECTED] > > > Sent: Monday, October 15, 2007 3:46 PM > > > To: morphmet > > > Subject: Measure of variability > > > > > > Dear Morphometricians: I am a taxonomist working on a revision > of a > > > genus with about 80 species of plants (palms). I have a data > matrix > > > with > > > measures of about 20 variables, taken from herbarium specimens. > Some > > > species are obviously much more variable than others. What I > want is > > > a > > > single measure of variability of each species. What is this? > > > > > > Thanks. Andrew Henderson > > > > > > > > > -- > > > Replies will be sent to the list. > > > For more information visit http://www.morphometrics.org > > > > > > > > > > -- > > Replies will be sent to the list. > > For more information visit http://www.morphometrics.org > > > > > > > -- > Replies will be sent to the list. > For more information visit http://www.morphometrics.org -- Replies will be sent to the list. For more information visit http://www.morphometrics.org
