[R] vectorization of a loop for mahalanobis distance calculation

2008-10-07 Thread Frank Hedler
Dear all,
We have a data frame x with n people as rows and k variables as columns.
Now, for each person (i.e., each row) we want to calculate a distance
between  him/her and EACH other person in x. In other words, we want to
create a n x n matrix with distances (with zeros in the diagonal).

However, we do not want to calculate Euclidian distances. We want to calculate
Mahalanobis distances, which take into account the covariance among
variables.

Below is the piece of code we wrote (covmat in the function below is the
variance-covariance matrix among variables in Data that has to be fed into
mahalonobis function we are using).
 mahadist = function(x, covmat) {
 dismat = matrix(0,ncol=nrow(x),nrow=nrow(x))

 for (i in 1:nrow(x)) {

   dismat[i,] = mahalanobis(as.matrix(x), as.matrix(x[i,]), covmat)^.5

 }

 return(dismat)

}


This piece of code works, but it is very slow. We were wondering if it's at
all possible to somehow vectorize this function. Any help would be greatly
appreciated.
Thanks,
Frank

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] vectorization of a loop for mahalanobis distance calculation

2008-10-07 Thread Sarah Goslee
distance() from the ecodist package will calculate Mahalanobis distances.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] vectorization of a loop for mahalanobis distance calculation

2008-10-07 Thread Sarah Goslee
Hi Frank,

If the way distance() calculates the Mahalanobis distance meets your
needs other than the covariance specification, you can tweak that
_very_ easily. If you use fix(distance) at the command line, you can
edit the source.
change the first line to:
function (x, method = euclidean, icov)
and under method 4, change the icov calculation to:
if(missing(icov)) {
   icov - solve(cov(x))
}

Alternatively, here's a simplified distanceM function with everything but the
relevant bits deleted. You'll still need to have ecodist loaded.

distanceM - function (x, method = mahalanobis, icov)
{
paireddiff - function(x) {
N - nrow(x)
P - ncol(x)
A - numeric(N * N * P)
A - .C(pdiff, as.double(as.vector(t(x))), as.integer(N),
as.integer(P), A = as.double(A), PACKAGE = ecodist)$A
A - array(A, dim = c(N, N, P))
A
}
x - as.matrix(x)
N - nrow(x)
P - ncol(x)

if(missing(icov)) {
   icov - solve(cov(x))
}
A - paireddiff(x)
A1 - apply(A, 1, function(z) (z %*% icov %*% t(z)))
D - A1[seq(1, N * N, by = (N + 1)), ]


D - D[col(D)  row(D)]
attr(D, Size) - N
attr(D, Labels) - rownames(x)
attr(D, Diag) - FALSE
attr(D, Upper) - FALSE
attr(D, method) - METHODS[method]
class(D) - dist
D
}

Sarah

On Tue, Oct 7, 2008 at 1:05 PM, Frank Hedler [EMAIL PROTECTED] wrote:
 Dear all,
 we just realized something. Sarah's distance function - indeed - calculates
 mahalanobis distance very well. However, it uses the
 observed variance-covariance matrix by default.
 What we actually need (sorry for not stating it clearly in to be able to
 specify which variance-covariance matrix goes into that calculation.
 On Tue, Oct 7, 2008 at 12:44 PM, Sarah Goslee [EMAIL PROTECTED]
 wrote:

 distance() from the ecodist package will calculate Mahalanobis distances.

 Sarah


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] vectorization of a loop for mahalanobis distance calculation

2008-10-07 Thread Frank Hedler
Dear all,we just realized something. Sarah's distance function - indeed -
calculates mahalanobis distance very well. However, it uses the
observed variance-covariance matrix by default.
What we actually need (sorry for not stating it clearly in to be able to
specify which variance-covariance matrix goes into that calculation.

On Tue, Oct 7, 2008 at 12:44 PM, Sarah Goslee [EMAIL PROTECTED]wrote:

 distance() from the ecodist package will calculate Mahalanobis distances.

 Sarah

 --
 Sarah Goslee
 http://www.functionaldiversity.org


ORIGINAL request:
Dear all,
We have a data frame x with n people as rows and k variables as columns.
Now, for each person (i.e., each row) we want to calculate a distance
between  him/her and EACH other person in x. In other words, we want to
create a n x n matrix with distances (with zeros in the diagonal).

However, we do not want to calculate Euclidian distances. We want to calculate
Mahalanobis distances, which take into account the covariance among
variables.

Below is the piece of code we wrote (covmat in the function below is the
variance-covariance matrix among variables in Data that has to be fed into
mahalonobis function we are using).
 mahadist = function(x, covmat) {
 dismat = matrix(0,ncol=nrow(x),nrow=nrow(x))

 for (i in 1:nrow(x)) {

   dismat[i,] = mahalanobis(as.matrix(x), as.matrix(x[i,]), covmat)^.5

 }

 return(dismat)

}


This piece of code works, but it is very slow. We were wondering if it's at
all possible to somehow vectorize this function.
Any help would be greatly appreciated.
Thanks,
Frank

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.