First thing to do is to use Rprof to determine where the time is being spent and then you can pinpoint what section of code is taking the time. A quick look says to do all your subsetting at once. You might look into using 'split' to create the subsets and then access the subsets with "[[...]]". So use Rprof and see if most of the time is being spent the the data.frame subsetting functions.
On 8/1/07, Sébastien <[EMAIL PROTECTED]> wrote: > Dear R-users, > > I have written the following code to generate some trellis plots. It > works perfectly fine except that it is quite slow when it is apply to my > typical datasets (over several thousands of lines). I believe the > problem comes from the loops I am using to subset my data.frame. I read > in the archives that the tapply function is often more efficient than a > loop in R. Unfortunately , it seems that I am not enough familiar with > the "philosophy" of this function to implement it in my code. Would you > have some suggestions to speed up the whole thing? > > Thanks in advance > > Sebastien > > PS: the rationale behind these loops is to split the trellis plots on > different pages, all plots on a page (or a group of pages) having a > given combination of values for the PLOT, DVID, PER and GRP parameters. > > ############################################### > > library(lattice) > rm(list=ls(all=TRUE)) > > # Generate a dummy dataset with > # - 20 individuals (ID) > # - individuals 1 to 10 belong to group (GRP) 1, 11 to 20 belong to group 2 > # - measurements (DV) done at 10 time points (TIME) per individuals on 2 > occassions (OCC) > # - modelisation of the DV versus TIME relationships with 4 different > models (MODEL) > # - predicted values (Y) > # - the PLOT column serves as a flag to plot together the models (A and > B) and (C and D) > > PLOT<-rep(1:2,each=40,times=20) > ID<-rep(1:20,each=80) > OCC<-rep(1:2,each=10,times=80) > GRP<-as.numeric(rep(gl(2,80),times=10)) > MODEL<-as.vector(rep(gl(4,20,label=c("A","B","C","D")),times=20)) > TIME<-rep.int(1:10,160) > DV<-OCC*(1:10)*rep(rnorm(20,50,10),each=80)+rep(rnorm(20,10,1),each=80) > Y<-jitter(DV) > > mydata<-data.frame(PLOT,ID,OCC,GRP,MODEL,TIME,DV,Y) > > mydata$DVID<-rep.int(1,1600) #in a real dataset, DVID could have > typically 2 to 3 levels > > ############# > # Plotting routine > ############# > > myPath<-"C:/" #TO BE MODIFIED > > nTrellisCol<-2 #number of columns per > Trellis plot > nTrellisRow<-3 #number of lines per > Trellis plot > > nDVID<-nlevels(factor(mydata$DVID)) #number of > DVID=observations types > nidPlot<-nlevels(factor(mydata$PLOT)) #number of items in the > PLOT column > nPer<-nlevels(factor(mydata$OCC)) #number of occassions > (OCC, PER, etc...) > nGRP<-nlevels(factor(mydata$GRP)) #number of groups > > pdf(file=paste(myPath,"test.pdf",sep="")) > > trellis.par.set(par.main.text=list(cex=1)) > trellis.par.set(par.ylab.text=list(font=2)) > trellis.par.set(par.xlab.text=list(font=2)) > > for (i in 1:nidPlot) { > #loop on PLOT id > # i=1 > idata<-subset(mydata,mydata$PLOT==i) > > for (j in 1:nDVID) { > #loop on DVID > # j=1 > ijdata<-subset(idata,idata$DVID==j) > > for (k in 1:nPer) { > #loop on Period > # k=1 > > ijkdata<-subset(ijdata,ijdata$OCC==k) > > for (l in 1:nGRP) { #loop on Group > # l=1 > > subdata<-subset(ijkdata,ijkdata$GRP==l) > > nModel<-nlevels(factor(subdata$MODEL)) > #number of models to be plotted in this loop > mylegend<-c("Raw data",levels(factor(subdata$MODEL))) > subID<-nlevels(factor(subdata$ID)) #number of > ID in the new dataset > > myplot<-xyplot(Y ~ TIME | ID, #creates plot > data = subdata, type = "l", > groups = MODEL, > observed = subdata$DV, > as.table=TRUE, > panel = function(x, y, ..., subscripts, observed) { > panel.points(x, pch=3,col=1,observed[subscripts]) > panel.xyplot(x, y, ..., > col=2:nlevels(subdata$MODEL),subscripts = subscripts)}, > > strip=function (which.panel,...){ > col<-rep("Black",subID) > llines(c(0,1,1,0,0),c(0,0,1,1,0),col.line=1) > ltext(rep(0.5,subID),rep(0.5,subID), > paste("Subject > ",levels(factor(subdata$ID))[which.panel],sep=""),cex=trellis.par.get("axis.text")[2])}, > > key=list(space="bottom", > lines = list(pch = as.integer(c(3,rep("",nModel))), type > = c("p", gl(1,nModel,label="l")), > col = > 1:(nModel+1),"cex"=trellis.par.get("axis.text")[2]), > text=list(mylegend, > "cex"=trellis.par.get("axis.text")[2])), > > xlab="Time (hr)", > ylab="Concentration (ng/mL)", > layout=c(nTrellisCol,nTrellisRow), > > main=paste(paste(paste("Plot ",i,sep=""), > paste(paste(", DVID ",j,sep=""), > paste(paste(", Occasion ",k,sep=""), > paste(", Group > ",l,sep="")))),sep="")) > > > trellis.par.set(par.xlab.text=list(cex=trellis.par.get("axis.text")[2])) > trellis.par.set(par.ylab.text=list(cex=trellis.par.get("axis.text")[2])) > > > print(myplot,panel.width=list(x=(0.75/nTrellisCol),units="npc"),panel.height=list(x=(0.50/nTrellisRow),units="npc")) > > }}}} > dev.off() > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.