Dear Thomas, Thanks for that.
The core problem (as I understand things) is that spurious correlation may be generated if an individual measurement that contribute to a ratio is itself themselves included in the analysis, along with the ratio. When the original measurement and the ratio are included in a larger database of measurements then correlational distortions surrounding the ratios affect the overall result. At least, this is how I understand the argument. An accessible demonstration of possible spurious correlation is illustrated by Atkinson et al. http://jap.physiology.org/cgi/content/full/97/2/792 Maybe the Cauchy distribution of ratios is behind the problem. I am no mathematician. If the same divisor is used (e.g. the sum of all the measurements in morphometrics) then, as I understand it, the problem of spurious correlation is absent. The question of ratios not being normally distributed certainly affects the probabilistic part of classification by discriminant analysis. This is perhaps a separate but compounding problem in using ratios, though on its own it should not have too severe an effect on the descriptive side of discriminant analysis (i.e. canonical variates analysis). I would welcome any further comments you have on this topic. Regards, Richard > >Subject: Re: using ratios in MV correlational analysis > From: Thomas Augustin <[EMAIL PROTECTED]> > Date: Mon, 29 Sep 2008 20:49:55 +0200 > To: [email protected] > >Dear Richard, > >I am not hundred percent certain whether this contributes to your >problem, but let me try nevertheless. > >One of the major problems in using ratios of variables could be the fact >that the ratio of normal variables is Cauchy distributed, and the Cauchy >distribution is the standard counterexample to all standard statistical >optimality results. For instance, Cauchy distributed variables do even >have an expected value, and the arithmetic mean of standard Cauchy >distributed variables has the same distribution as one single variable, >i.e. we can not learn from the data by increasing the sample size. > >Hope this comment is of some help, the more so as discriminant analysis >often relies on a model where variables are taken to be normally >distributed, so that, in my view, taking the ratio of these variables >could lead to such problems. > >Best wishes > >Thomas > >========== > >Prof Dr Thomas Augustin >Department of Statistics >University of Munich >Ludwigstr. 33/II >D-80539 Munich >GERMANY > >Tel +49 89 2180 3520 >Fax+49 89 2180 5044 >[EMAIL PROTECTED] >www.stat.uni-muenchen.de/~thomas > > > > > >Richard Wright schrieb: >> There is a scattered literature on the dangers, or otherwise, of using >> ratios in correlational analyses. >> >> I have read what looks like a non-obfuscatory paper on this topic by >> Firebaugh and Gibbs "User's Guide to Ratio Variables" from American >> Sociological Review, Vol.50, No.5 (1985) pp.713-722. >> >> On page 721 the authors state: "Avoid mixed methods (part ratio, part >> component). If Z is controlled by division rather than by >> residualization, all of the other variables should be divided by Z. >> Should only some of the variables by divided by Z, the effect of Z is >> 'controlled' for some variables and not for others, and a defensible >> interpretation of the results is difficult." >> >> The reason for my interest is that I am trying to evaluate a >> morphometric paper that does linear discriminant analysis on a mixture >> of measurements and ratios derived from those same measurements. For >> example the analysis includes (A) Length as well as Height/Length and >> (B) Height and Breadth as well as Height/Breadth and Height/Length. >> >> This paper seems to be an example of the 'mixed method' that Firebaugh >> and Gibbs warn against, where data are part ratio, part measurement, >> and spurious correlations are introduced into the data. >> >> So my first question is whether I am correct in this interpretation. >> >> My second question also concerns ratios. >> >> In his Multivariate Statistical Methods, 2nd ed. 1994, B.F.J. Manly >> suggests controlling for the effects of absolute size difference in a >> PCA of pots (goblets) by expressing the measurements as "a proportion >> of the sum of all measurements on that goblet." >> >> Given that each variable is divided by the same sum, this example of >> the use of ratios seems to be a case that Firebaugh and Gibbs would >> not frown on. >> >> I shall welcome any comments on these questions and any pointers to >> relevant literature. >> >> Richard >> >> ---------------------------------------------- >> CLASS-L list. >> Instructions: http://www.classification-society.org/csna/lists.html#class-l >> > >---------------------------------------------- >CLASS-L list. >Instructions: http://www.classification-society.org/csna/lists.html#class-l ---------------------------------------------- CLASS-L list. Instructions: http://www.classification-society.org/csna/lists.html#class-l
