[R] vectorization of a loop for mahalanobis distance calculation
Dear all, We have a data frame x with n people as rows and k variables as columns. Now, for each person (i.e., each row) we want to calculate a distance between him/her and EACH other person in x. In other words, we want to create a n x n matrix with distances (with zeros in the diagonal). However, we do not want to calculate Euclidian distances. We want to calculate Mahalanobis distances, which take into account the covariance among variables. Below is the piece of code we wrote (covmat in the function below is the variance-covariance matrix among variables in Data that has to be fed into mahalonobis function we are using). mahadist = function(x, covmat) { dismat = matrix(0,ncol=nrow(x),nrow=nrow(x)) for (i in 1:nrow(x)) { dismat[i,] = mahalanobis(as.matrix(x), as.matrix(x[i,]), covmat)^.5 } return(dismat) } This piece of code works, but it is very slow. We were wondering if it's at all possible to somehow vectorize this function. Any help would be greatly appreciated. Thanks, Frank [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] vectorization of a loop for mahalanobis distance calculation
distance() from the ecodist package will calculate Mahalanobis distances. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] vectorization of a loop for mahalanobis distance calculation
Hi Frank, If the way distance() calculates the Mahalanobis distance meets your needs other than the covariance specification, you can tweak that _very_ easily. If you use fix(distance) at the command line, you can edit the source. change the first line to: function (x, method = euclidean, icov) and under method 4, change the icov calculation to: if(missing(icov)) { icov - solve(cov(x)) } Alternatively, here's a simplified distanceM function with everything but the relevant bits deleted. You'll still need to have ecodist loaded. distanceM - function (x, method = mahalanobis, icov) { paireddiff - function(x) { N - nrow(x) P - ncol(x) A - numeric(N * N * P) A - .C(pdiff, as.double(as.vector(t(x))), as.integer(N), as.integer(P), A = as.double(A), PACKAGE = ecodist)$A A - array(A, dim = c(N, N, P)) A } x - as.matrix(x) N - nrow(x) P - ncol(x) if(missing(icov)) { icov - solve(cov(x)) } A - paireddiff(x) A1 - apply(A, 1, function(z) (z %*% icov %*% t(z))) D - A1[seq(1, N * N, by = (N + 1)), ] D - D[col(D) row(D)] attr(D, Size) - N attr(D, Labels) - rownames(x) attr(D, Diag) - FALSE attr(D, Upper) - FALSE attr(D, method) - METHODS[method] class(D) - dist D } Sarah On Tue, Oct 7, 2008 at 1:05 PM, Frank Hedler [EMAIL PROTECTED] wrote: Dear all, we just realized something. Sarah's distance function - indeed - calculates mahalanobis distance very well. However, it uses the observed variance-covariance matrix by default. What we actually need (sorry for not stating it clearly in to be able to specify which variance-covariance matrix goes into that calculation. On Tue, Oct 7, 2008 at 12:44 PM, Sarah Goslee [EMAIL PROTECTED] wrote: distance() from the ecodist package will calculate Mahalanobis distances. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] vectorization of a loop for mahalanobis distance calculation
Dear all,we just realized something. Sarah's distance function - indeed - calculates mahalanobis distance very well. However, it uses the observed variance-covariance matrix by default. What we actually need (sorry for not stating it clearly in to be able to specify which variance-covariance matrix goes into that calculation. On Tue, Oct 7, 2008 at 12:44 PM, Sarah Goslee [EMAIL PROTECTED]wrote: distance() from the ecodist package will calculate Mahalanobis distances. Sarah -- Sarah Goslee http://www.functionaldiversity.org ORIGINAL request: Dear all, We have a data frame x with n people as rows and k variables as columns. Now, for each person (i.e., each row) we want to calculate a distance between him/her and EACH other person in x. In other words, we want to create a n x n matrix with distances (with zeros in the diagonal). However, we do not want to calculate Euclidian distances. We want to calculate Mahalanobis distances, which take into account the covariance among variables. Below is the piece of code we wrote (covmat in the function below is the variance-covariance matrix among variables in Data that has to be fed into mahalonobis function we are using). mahadist = function(x, covmat) { dismat = matrix(0,ncol=nrow(x),nrow=nrow(x)) for (i in 1:nrow(x)) { dismat[i,] = mahalanobis(as.matrix(x), as.matrix(x[i,]), covmat)^.5 } return(dismat) } This piece of code works, but it is very slow. We were wondering if it's at all possible to somehow vectorize this function. Any help would be greatly appreciated. Thanks, Frank [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.