set.seed(123) N = 30000 K = 400 theData = matrix(rnorm(N*K), ncol=K) theData = as.data.frame(theData) theData = cbind(indicator = sample(0:1, N, rep=T), theData)
> system.time(results <- colMeans(subset(theData, indicator == 1))) user system elapsed 2.309 1.319 3.853 b On Jul 20, 2007, at 6:17 PM, Diogo Alagador wrote: > Hi all, > > I'm handling massive data.frames and matrices in R (30000 x 400). > In the 1st column, say, I have 0s and 1s indicating rows that > matter; other columns have probability values. > One simple task I would like to do would be to get the column mean > values for signaled rows (the ones with 1) > As a very fresh "programmer" I have build a simple function in R > which should not be very efficient indeed! It works well for > current-dimension matrices, but it just not goes so well in huge ones. > > meanprob<-function(Robj){ > NLINE<-dim(Robj)[1]; > NCOLUMN<-dim(Robj)[2]; > mprob<-c(rep(0,(NCOLUMN-1))); > for (i in 2:NCOLUMN){ > sumprob<-0; > pa<-0; > for (j in 1:NLINE){ > if(Robj[j,1]!=0){ > pa<-pa+1; > sumprob<-Robj[j,i]+sumprob; > } > } > mprob[i-1]<-sumprob/pa; > } > return(mprob); > } > > > So I "only" see 3 ways to get through the problem: > > - to reformulate the function to gain efficiency; > - to establish a C-routine (for example), where loops are more > "speedy", and then interfacing with R; > - to find some function/ package that already do that. > > Can anybody illuminate my way here, > > Mush thanks, > > Diogo Andre' Alagador > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.