RE: [MORPHMET] Mahalanobis distance in cluster analysis of shape variables

F. James Rohlf Sun, 31 Jan 2016 10:15:02 -0800

I am sorry but that does not fix the problem. The problem is that Mahalanobis 
distance is not defined (and thus cannot even be calculated) if the 
within-group covariance matrix is singular – which it must be if its number of 
degrees of freedom is less than the number of shape variables. Even if the 
sample sizes were somewhat larger there would still be a problem as the 
coefficient is very sensitive to chance unless the sample sizes are much larger.

Note that one uses the within-group covariance matrix not the overall 
covariance matrix. This also reveals the problem that for the distance to be 
very meaningful one assumes that the covariances matrices are homogeneous 
across groups. Often unlikely to be true in many studies.

Rather disappointing as there are many situations in which one would like to 
use that coefficient. An ad hoc solution that is often used  is to just use the 
first few PCA axes as the shape variables. Of course one might then miss more 
subtle differences among groups if they do not account for a relatively large 
proportion of the total variance.

____________________________________________

F. James Rohlf, Distinguished Professor, Emeritus. Ecology & Evolution

Research Professor, Anthropology

Stony Brook University

From: Miguel Eduardo Delgado Burbano [mailto:[email protected]] 
Sent: Sunday, January 31, 2016 3:35 AM
To: [email protected]
Cc: Elahep <[email protected]>; MORPHMET <[email protected]>; 
[email protected]
Subject: Re: [MORPHMET] Mahalanobis distance in cluster analysis of shape 
variables

Usually researchers use small sample sizes for distinct reasons in my case 
because I study archaeological and paleontological derived samples. The 
practical problem mentioned by James could be partially solved correcting the 
D2 distances for small sample size, that is, calculating an unbiased 
Mahalanobis distance ∆2 following Marcus L. 1993. (Some aspects of multivariate 
statistics for morphometrics. In: Marcus LF, Bello E, García-Valdecasas A, 
editors. Contributions to morphometrics. Museo Nacional de Ciencias Naturales, 
Madrid. p 99-130).

On Sat, Jan 30, 2016 at 4:51 PM, F. James Rohlf <[email protected] 
<mailto:[email protected]> > wrote:

The distinction is that Mahalanobis distance should be thought of as a 
statistical distance. For a single variable it is like a z-score (a difference 
divided by a standard deviation). It is not a measure of the absolute amount of 
difference. In the multivariate case Mahalanobis distance is relative to the 
amount of the amount of variation in the direction of the difference (that is 
what taking into account within-group covariation gives you).

Both Mahalanobis and Euclidean distances are valid. It depends on what you wish 
“distance” to mean. In morphometrics do you want to cluster based on how 
similar shapes are (in terms of  distance in Kendall shape space) or based on 
the degree of statistical overlap in population samples (e.g., the degree to 
which specimens from the two groups might be misidentified).

A practical problem with Mahalanobis distance in many morphometric studies is 
that it requires large sample sizes within groups because landmark data is 
usually high dimensional and thus very large samples are needed for reliable 
results.

____________________________________________

F. James Rohlf, Distinguished Professor, Emeritus. Ecology & Evolution

Research Professor, Anthropology

Stony Brook University

From: Elahep [mailto:[email protected] <mailto:[email protected]> ] 
Sent: Saturday, January 30, 2016 7:14 AM
To: MORPHMET <[email protected] <mailto:[email protected]> >
Cc: [email protected] <mailto:[email protected]> ; [email protected] 
<mailto:[email protected]> 
Subject: Re: [MORPHMET] Mahalanobis distance in cluster analysis of shape 
variables

Dear Joseph,

Thanks for your detailed explanation. As it is recommended by Claude in 
"morphometrics with R" (2008) it's better to use the Mahalanobis distance for 
clustering group means, because this will be scaled by the within-group 
variance-covariance. In my analysis, I calculated the mean value of relative 
warp scores for each population and then carried out a UPGMA cluster analysis 
based on the Euclidian distance and results were satisfying for me and they 
were congruent with my other results. According to the book and other articles 
I ran the same analysis but based on the Mahalanobis distance in PAST software, 
but unfortunately whenever I ran the analysis the software error "Invalid 
floating point operation" appeared!! so I couldn't see the Mahalanobis's 
cluster!! (I couldn't realize why this error happens)

Euclidian distance worked for me, but I was just curious to understand if my 
analyses is statistically meaningful!!

Thanks again for your answer,

Elahe

On Saturday, January 30, 2016 at 5:12:46 PM UTC+3:30, Joseph Kunkel wrote:

I can not speak directly to why it is frequently used in GM cluster analysis 
but I would like to mention how I look at Mahalanobis distance based on its 
calculation. 

Mahalanobis distance is not a pure distance metric like Euclidian or Manhattan 
distance, as you have stated it is ‘standardized’.  What doe that really mean?  
It sounds supeficially good. 

One way of computing it is to rotate the k-landmark data set to simplest form 
treating the landmarks as factors.  This way would consider all landmarks to 
have a common covariance structure in XY or XYZ in three dimensions.  That is a 
already a streetch, since not all landmarks can be assumed to have the same 
covariance structure.  In addition the landmarks have all been already centered 
about their centroid and rotated to coincide, which has eliminated a dgeree of 
freedom of variability that can have consequences.   

Furthermore not all species landmarks can be expected to have the same 
covariance structure, which is an assumption made in the ordinary Mahalanobis 
distance application to strut analysis between populations or species.  The 
assumption of similar data structure of course applies to the null hypothesis 
where there is no difference.  The typical statistical test explodes when the 
null hypothesis is falsified so just when you want the Mahalanobis distance 
metric to be accurate it starts misbehaving. 

After rotation to simplest axes one does an 1 df F-test between each of the 
landmarks.  These tests are all independent so they can be summed together to 
produce a k df F-test which is Mahalonobis D squared.    So Mahalonobis D is 
the square root of the sum of independent F-tests, but those F-tests are based 
on all sorts of assumptions about the variance of the landmarks.  I immagine on 
could modify calculation of D by limiting the sum over the top 95 or 99% 
variance components of the principal components. 

Many times applications of analytical techniques are judged by whether they 
‘work’ or not.   If a clustering method works for you, use it(?).  I am of the 
opinion that I use statistics to convince myself rather than the audience.   A 
confluence on many arguments is used to make a case. 

Joe 

-·.  .· ·.  .><((((º>·.  .· ·.  .><((((º>·.  .· ·.  .><((((º> .··.· >=-       
=º}}}}}>< 
Joseph G. Kunkel, Research Professor 
UNE Biddeford ME 04005 
http://www.bio.umass.edu/biology/kunkel/ 

> On Jan 30, 2016, at 7:11 AM, Elahep <[email protected] 
> <mailto:[email protected]> > wrote: 
> 
> 
> Hello all, 
> 
> 
> 
> I have seen in many GM articles people use Mahalanobis distance for cluster 
> analysis. What is the advantage of using Mahalanobis distance over Euclidian 
> distance as similarity measure in cluster analysis of shape variables? 
> 
> As far as I know Mahalanobis distance is the standardized form of Euclidean 
> distance which standardized data with adjustments made for correlation 
> between variables and weights all variables equally. 
> 
> Why this distance measure is frequently used in GM cluster analysis?? 
> 
> 
> 
> Thanks in advance 
> 
> Elahe 
> 
> 
> -- 
> MORPHMET may be accessed via its webpage at http://www.morphometrics.org 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "MORPHMET" group. 
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]> . 

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected] 
<mailto:[email protected]> .

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected] 
<mailto:[email protected]> .

-- 

*************************************************

Miguel Delgado PhD

CONICET-División Antropología.

Facultad de Ciencias Naturales y Museo.

Universidad Nacional de La Plata

Paseo del Bosque s/n. La Plata 1900. Argentina

Cel: 5492216795916. Fax: 54 221 4257527

https://unlp.academia.edu/DelgadoMiguel

http://www.cearqueologia.com.ar/

E-mail: [email protected] <mailto:[email protected]> 

*************************************************

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].

RE: [MORPHMET] Mahalanobis distance in cluster analysis of shape variables

Reply via email to