I am sorry but that does not fix the problem. The problem is that Mahalanobis distance is not defined (and thus cannot even be calculated) if the within-group covariance matrix is singular – which it must be if its number of degrees of freedom is less than the number of shape variables. Even if the sample sizes were somewhat larger there would still be a problem as the coefficient is very sensitive to chance unless the sample sizes are much larger.
Note that one uses the within-group covariance matrix not the overall covariance matrix. This also reveals the problem that for the distance to be very meaningful one assumes that the covariances matrices are homogeneous across groups. Often unlikely to be true in many studies. Rather disappointing as there are many situations in which one would like to use that coefficient. An ad hoc solution that is often used is to just use the first few PCA axes as the shape variables. Of course one might then miss more subtle differences among groups if they do not account for a relatively large proportion of the total variance. ____________________________________________ F. James Rohlf, Distinguished Professor, Emeritus. Ecology & Evolution Research Professor, Anthropology Stony Brook University From: Miguel Eduardo Delgado Burbano [mailto:mdelgadoburb...@gmail.com] Sent: Sunday, January 31, 2016 3:35 AM To: f.james.ro...@stonybrook.edu Cc: Elahep <ellie.parv...@gmail.com>; MORPHMET <morphmet@morphometrics.org>; jkunk...@une.edu Subject: Re: [MORPHMET] Mahalanobis distance in cluster analysis of shape variables Usually researchers use small sample sizes for distinct reasons in my case because I study archaeological and paleontological derived samples. The practical problem mentioned by James could be partially solved correcting the D2 distances for small sample size, that is, calculating an unbiased Mahalanobis distance ∆2 following Marcus L. 1993. (Some aspects of multivariate statistics for morphometrics. In: Marcus LF, Bello E, García-Valdecasas A, editors. Contributions to morphometrics. Museo Nacional de Ciencias Naturales, Madrid. p 99-130). On Sat, Jan 30, 2016 at 4:51 PM, F. James Rohlf <f.james.ro...@stonybrook.edu <mailto:f.james.ro...@stonybrook.edu> > wrote: The distinction is that Mahalanobis distance should be thought of as a statistical distance. For a single variable it is like a z-score (a difference divided by a standard deviation). It is not a measure of the absolute amount of difference. In the multivariate case Mahalanobis distance is relative to the amount of the amount of variation in the direction of the difference (that is what taking into account within-group covariation gives you). Both Mahalanobis and Euclidean distances are valid. It depends on what you wish “distance” to mean. In morphometrics do you want to cluster based on how similar shapes are (in terms of distance in Kendall shape space) or based on the degree of statistical overlap in population samples (e.g., the degree to which specimens from the two groups might be misidentified). A practical problem with Mahalanobis distance in many morphometric studies is that it requires large sample sizes within groups because landmark data is usually high dimensional and thus very large samples are needed for reliable results. ____________________________________________ F. James Rohlf, Distinguished Professor, Emeritus. Ecology & Evolution Research Professor, Anthropology Stony Brook University From: Elahep [mailto:ellie.parv...@gmail.com <mailto:ellie.parv...@gmail.com> ] Sent: Saturday, January 30, 2016 7:14 AM To: MORPHMET <morphmet@morphometrics.org <mailto:morphmet@morphometrics.org> > Cc: ellie.parv...@gmail.com <mailto:ellie.parv...@gmail.com> ; jkunk...@une.edu <mailto:jkunk...@une.edu> Subject: Re: [MORPHMET] Mahalanobis distance in cluster analysis of shape variables Dear Joseph, Thanks for your detailed explanation. As it is recommended by Claude in "morphometrics with R" (2008) it's better to use the Mahalanobis distance for clustering group means, because this will be scaled by the within-group variance-covariance. In my analysis, I calculated the mean value of relative warp scores for each population and then carried out a UPGMA cluster analysis based on the Euclidian distance and results were satisfying for me and they were congruent with my other results. According to the book and other articles I ran the same analysis but based on the Mahalanobis distance in PAST software, but unfortunately whenever I ran the analysis the software error "Invalid floating point operation" appeared!! so I couldn't see the Mahalanobis's cluster!! (I couldn't realize why this error happens) Euclidian distance worked for me, but I was just curious to understand if my analyses is statistically meaningful!! Thanks again for your answer, Elahe On Saturday, January 30, 2016 at 5:12:46 PM UTC+3:30, Joseph Kunkel wrote: I can not speak directly to why it is frequently used in GM cluster analysis but I would like to mention how I look at Mahalanobis distance based on its calculation. Mahalanobis distance is not a pure distance metric like Euclidian or Manhattan distance, as you have stated it is ‘standardized’. What doe that really mean? It sounds supeficially good. One way of computing it is to rotate the k-landmark data set to simplest form treating the landmarks as factors. This way would consider all landmarks to have a common covariance structure in XY or XYZ in three dimensions. That is a already a streetch, since not all landmarks can be assumed to have the same covariance structure. In addition the landmarks have all been already centered about their centroid and rotated to coincide, which has eliminated a dgeree of freedom of variability that can have consequences. Furthermore not all species landmarks can be expected to have the same covariance structure, which is an assumption made in the ordinary Mahalanobis distance application to strut analysis between populations or species. The assumption of similar data structure of course applies to the null hypothesis where there is no difference. The typical statistical test explodes when the null hypothesis is falsified so just when you want the Mahalanobis distance metric to be accurate it starts misbehaving. After rotation to simplest axes one does an 1 df F-test between each of the landmarks. These tests are all independent so they can be summed together to produce a k df F-test which is Mahalonobis D squared. So Mahalonobis D is the square root of the sum of independent F-tests, but those F-tests are based on all sorts of assumptions about the variance of the landmarks. I immagine on could modify calculation of D by limiting the sum over the top 95 or 99% variance components of the principal components. Many times applications of analytical techniques are judged by whether they ‘work’ or not. If a clustering method works for you, use it(?). I am of the opinion that I use statistics to convince myself rather than the audience. A confluence on many arguments is used to make a case. Joe -·. .· ·. .><((((º>·. .· ·. .><((((º>·. .· ·. .><((((º> .··.· >=- =º}}}}}>< Joseph G. Kunkel, Research Professor UNE Biddeford ME 04005 http://www.bio.umass.edu/biology/kunkel/ > On Jan 30, 2016, at 7:11 AM, Elahep <ellie....@gmail.com > <mailto:ellie....@gmail.com> > wrote: > > > Hello all, > > > > I have seen in many GM articles people use Mahalanobis distance for cluster > analysis. What is the advantage of using Mahalanobis distance over Euclidian > distance as similarity measure in cluster analysis of shape variables? > > As far as I know Mahalanobis distance is the standardized form of Euclidean > distance which standardized data with adjustments made for correlation > between variables and weights all variables equally. > > Why this distance measure is frequently used in GM cluster analysis?? > > > > Thanks in advance > > Elahe > > > -- > MORPHMET may be accessed via its webpage at http://www.morphometrics.org > --- > You received this message because you are subscribed to the Google Groups > "MORPHMET" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to morphmet+u...@morphometrics.org > <mailto:morphmet+u...@morphometrics.org> . -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org <mailto:morphmet+unsubscr...@morphometrics.org> . -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org <mailto:morphmet+unsubscr...@morphometrics.org> . -- ************************************************* Miguel Delgado PhD CONICET-División Antropología. Facultad de Ciencias Naturales y Museo. Universidad Nacional de La Plata Paseo del Bosque s/n. La Plata 1900. Argentina Cel: 5492216795916. Fax: 54 221 4257527 https://unlp.academia.edu/DelgadoMiguel http://www.cearqueologia.com.ar/ E-mail: medelg...@fcnym.unlp.edu.ar <mailto:medelg...@fcnym.unlp.edu.ar> ************************************************* -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org.