[Monty B. ] >I have to handle a large matrix (1000 x 10001) where in the >last column i have a value that all the preceding values in the same row >has to be compared to. I have made the following code :
># generate a (1000 x 10001) matrix, testm ># generate statistics matrix 1000 x 4: >qnt <- c(0.01, 0.05) >cmp_fun <- function(x) >{ > LAST <- length(x) > smpls <- x[1:(LAST-1)] > real <- x[LAST] > ret <- vector(length=length(qnt)*2) > for (i in 1:length(qnt)) > { > q_i <- quantile(smpls, qnt[i]) # the quantile i > m_i <- mean(smpls[smpls<q_i ] ) # mean of obs less than q_i > ret[i] <- ifelse(real < q_i, 1, 0) > ret[length(qnt)+i] <- ifelse(real < q_i, real - m_i, 0) > } > ret >} >hcvx <- apply(testm, 1, cmp_fun) >Can anyone advise as to how I can optimize the runtime of this problem? >All suggestions are welcome! You may speed it up a bit, not so much, with the following: stats.testm <- function (testm, qnt=c(0.01, 0.05)) { quants <- apply(testm[, 1:(ncol(testm)-1)], 1, quantile, qnt) smpls <- testm[rep(1:nrow(testm), each=length(qnt)), 1:(ncol(testm)-1)] reals <- testm[rep(1:nrow(testm), each=length(qnt)), ncol(testm)] keeps <- smpls < rep(quants, ncol(smpls)) means <- rowSums(smpls * keeps) / rowSums(keeps) matrix(rbind((reals < quants) + 0, (reals < quants) * (reals - means)), length(qnt) * 2) } Try it with something like: gen.testm <- function (n, m) { matrix(sample(0:99, n * (m + 1), TRUE), n) } testm <- gen.testm(100, 100) stats.testm(testm) Without checking, I would suspect that quantile is the big consumer. If you could make it without quantile interpolation, maybe some more vectorisation could be possible, but in any case, I do not think you can avoid sorting each row separately, in one way or another (currently done within quantile). -- François Pinard http://pinard.progiciels-bpi.ca ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html