-------- Original Message --------
Subject: RE: Proper use and meaning of Mahalanobis distances
Date: Sat, 23 Feb 2008 09:50:56 +0100
From: Elsa et Stéphane BOUEE <[EMAIL PROTECTED]>
To: 'morphmet' <[EMAIL PROTECTED]>

Speaking about Mahalanobis distance (D) I have a question/remark.

Due to random fluctuation in a finite number of observations, D is not null and will increase with the number of variables. Markus has proposed a formula that takes into account this fact (I did not find the mathematical demonstration of this formula):

Corrected(D)=[(n1+n2-p-3)*D/(n1+n2-2)]-[(n1+n2)*p/n1*n2]
With:   D=mahalanobis distance
        n1 and n2: number of observations in the 2 groups
        p: number of variables

I applied this formula on a dataset and found negative results (even with a small number of variables (5)), which is embarrassing for a distance…

Therefore, I used another method to encompass this bias. I randomly permuted the variables with the observations (I neither cannot use my hands, but hope everyone can understand) and calculated 10000 random D by using this method. Then, I subtracted the mean of those random D to the true D calculated on my dataset.

Am I correct doing so ?
Has anyone an idea of a better (exact mathematic) way to correct the D
without having negative values?

Thank you for your answers

Stéphane BOUEE

-----Message d'origine-----
De : morphmet [mailto:[EMAIL PROTECTED]
Envoyé : mercredi 20 février 2008 23:10
À : morphmet
Objet : RE: Proper use and meaning of Mahalanobis distances

-------- Original Message --------

Subject: RE: Proper use and meaning of Mahalanobis distances

Date: Wed, 20 Feb 2008 10:49:42 -0800 (PST)

From: F. James Rohlf <[EMAIL PROTECTED]>

Reply-To: [EMAIL PROTECTED]

Organization: Stony Brook University

To: [email protected]

References: <[EMAIL PROTECTED]>

The formula for Mahalanobis distance ensures that it _always_ gives a

distance between points relative to variation in the pooled

within-groups covariance matrix used in its computation. It has the same

meaning in a CVA as when one just compares two groups.  It is not a

measure of absolute morphological difference but it a measure of how

easy it is to distinguish two groups. Better to think of it as a

statistical distance.

Perhaps not useful to think of it as "correcting" for correlations among

variables - it simply gives one the distance between two groups relative

to the amount of within-group variation in the direction of the

difference between the two groups being compared. (easier to explain if

I could wave my hands around in this message.)

Note also that often the formulas for Mahalanobis distance are actually

formulas for the square of that distance so a square root is often needed.

------------------------

F. James Rohlf, Distinguished Professor

Ecology & Evolution, Stony Brook University

www: http://life.bio.sunysb.edu/ee/rohlf

 -----Original Message-----

 From: morphmet [mailto:[EMAIL PROTECTED]

 Sent: Tuesday, February 19, 2008 3:40 PM

 To: morphmet

 Subject: Proper use and meaning of Mahalanobis distances



 Misdirected post 7 of 7. -mod



 -------- Original Message --------

 Subject: Proper use and meaning of Mahalanobis distances

 Date: Mon, 11 Feb 2008 20:40:19 -0500

 From: morphmet <[EMAIL PROTECTED]>

 To: morphmet <[EMAIL PROTECTED]>



 -------- Original Message --------

 Subject: Proper use and meaning of Mahalanobis distances

 Date:     Mon, 11 Feb 2008 12:17:11 -0800 (PST)

 From:     [EMAIL PROTECTED]

 To:       [email protected]







 Dear colleauges,



 Recently I received comments from a manuscript reviewer regarding the

 use of

 Mahalanobis distance vs. Euclidean. The reviwer argues that Euclidean

 distance

 measures the mrophological difference between means, Mahalanobis scales

 that

 difference by within group variance.



 The problem I found with this remark is that Mahalanobis distance can

 only be

 interpreted in such a way under the context of a discriminant function

 or

 similar (e.g. CVA). But not in all cases will Mahalanobis distance

 expresses

 itself as maximizing group differentiation. It does corrects for

 variable

 correlation, but by no means will it by itself scale distances relative

 to

 within group variance.



 The reviewer is of course anonimous, but I hope he/she can read this.



 This issue raised after my comments on the proper use and

 interpretation of

 discriminant analysis (or similar such as CVA) as evidence for group

 separation. I argue that there is no point in using such methods as

 evidence

 for the existence of groups, since they require the existence of such

 groups at

 the start. An good example in tautology, and a misleading one since

 distances

 between groups will always tend to be large, and should not not be

 interpreted

 as one interprets Euclidean distances in a PCA.



 I will appreciate any comments regarding this subject.



 Thanks



 Pablo



 Pablo Jarrin

 Dept. of Biology

 Boston University









 --

 Replies will be sent to the list.

 For more information visit http://www.morphometrics.org







 --

 Replies will be sent to the list.

 For more information visit http://www.morphometrics.org

--

Replies will be sent to the list.

For more information visit http://www.morphometrics.org


--
Replies will be sent to the list.
For more information visit http://www.morphometrics.org





--
Replies will be sent to the list.
For more information visit http://www.morphometrics.org

Reply via email to