Hi William, Thanks for the comments and explanation. It is really good to know the details of rowMeans. I did modified Peter's codes from length(x[x=="02"]) to sum(x=="02"), though it improved only in few seconds. :)
Best, Mike -----Original Message----- From: William Dunlap [mailto:wdun...@tibco.com] Sent: Friday, May 15, 2009 10:09 AM To: Ping-Hsun Hsieh Subject: RE: [R] memory usage grows too fast rowMeans(dataMatrix=="02") must (a) make a logical matrix the dimensions of dataMatrix in which to put the result of dataMatrix=="02" (4 bytes/logical element) (b) make a double precision matrix (8 bytes/element) the size of that logical matrix because rowMeans uses some C code that only works on doubles apply(dataMatrix,1,function(x)length(x[x=="02"])/ncol(dataMatrix)) never has to make any copies of the entire matrix. It extracts a row at a time and when it is done with the row, the memory used for working on the row is available for other uses. Note that it would probably be a tad faster if it were changed to apply(dataMatrix,1,function(x)sum(x=="02")) / ncol(dataMatrix) as sum(logicalVector) is the same as length(x[logicalVector]) and there is no need to compute ncol(dataMatrix) more than once. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com > -----Original Message----- > From: Ping-Hsun Hsieh [mailto:hsi...@ohsu.edu] > Sent: Friday, May 15, 2009 9:58 AM > To: Peter Alspach; William Dunlap; hadley wickham > Cc: r-help@r-project.org > Subject: RE: [R] memory usage grows too fast > > Thanks for Peter, William, and Hadley's helps. > Your codes are much more concise than mine. :P > > Both William and Hadley's comments are the same. Here are their codes. > > f <- function(dataMatrix) rowMeans(datamatrix=="02") > > And Peter's codes are the following. > > apply(yourMatrix, 1, function(x) > length(x[x==yourPattern]))/ncol(yourMatrix) > > > In terms of the running time, the first one ran faster than > the later one on my dataset (2.5 mins vs. 6.4 mins) > The memory consumption, however, of the first one is much > higher than the later. ( >8G vs. ~3G ) > > Any thoughts? My guess is the rowMeans created extra copies > to perform its calculation, but not so sure. > And I am also interested in understanding ways to handle > memory issues. Help someone could shed light on this for me. :) > > Best, > Mike > > -----Original Message----- > From: Peter Alspach [mailto:palsp...@hortresearch.co.nz] > Sent: Thursday, May 14, 2009 4:47 PM > To: Ping-Hsun Hsieh > Subject: RE: [R] memory usage grows too fast > > Tena koe Mike > > If I understand you correctly, you should be able to use > something like: > > apply(yourMatrix, 1, function(x) > length(x[x==yourPattern]))/ncol(yourMatrix) > > I see you've divided by nrow(yourMatrix) so perhaps I am missing > something. > > HTH ... > > Peter Alspach > > > > > -----Original Message----- > > From: r-help-boun...@r-project.org > > [mailto:r-help-boun...@r-project.org] On Behalf Of Ping-Hsun Hsieh > > Sent: Friday, 15 May 2009 11:22 a.m. > > To: r-help@r-project.org > > Subject: [R] memory usage grows too fast > > > > Hi All, > > > > I have a 1000x1000000 matrix. > > The calculation I would like to do is actually very simple: > > for each row, calculate the frequency of a given pattern. For > > example, a toy dataset is as follows. > > > > Col1 Col2 Col3 Col4 > > 01 02 02 00 => Freq of "02" is 0.5 > > 02 02 02 01 => Freq of "02" is 0.75 > > 00 02 01 01 ... > > > > My code is quite simple as the following to find the pattern "02". > > > > OccurrenceRate_Fun<-function(dataMatrix) > > { > > tmp<-NULL > > tmpMatrix<-apply(dataMatrix,1,match,"02") > > for ( i in 1: ncol(tmpMatrix)) > > { > > tmpRate<-table(tmpMatrix[,i])[[1]]/ nrow(tmpMatrix) > > tmp<-c(tmp,tmpHET) > > } > > rm(tmpMatrix) > > rm(tmpRate) > > return(tmp) > > gc() > > } > > > > The problem is the memory usage grows very fast and hard to > > be handled on machines with less RAM. > > Could anyone please give me some comments on how to reduce > > the space complexity in this calculation? > > > > Thanks, > > Mike > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > The contents of this e-mail are confidential and may be > subject to legal privilege. > If you are not the intended recipient you must not use, > disseminate, distribute or > reproduce all or any part of this e-mail or attachments. If > you have received this > e-mail in error, please notify the sender and delete all > material pertaining to this > e-mail. Any opinion or views expressed in this e-mail are > those of the individual > sender and may not represent those of The New Zealand > Institute for Plant and > Food Research Limited. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.