On Wed, Nov 26, 2008 at 1:16 PM, Stavros Macrakis <[EMAIL PROTECTED]> wrote: > I routinely compute with a 2,500,000-row dataset with 16 columns, > which takes 410MB of storage; my Windows box has 4GB, which avoids > thrashing. As long as I'm careful not to compute and save multiple > copies of the entire data frame (because 32-bit Windows R is limited > to about 1.5GB address space total, including any intermediate > results), R works impressively well and fast with this dataset for > selections, calculations, cross-tabs, plotting, etc. For example, > simple single-column statistics and cross-tabs take << 1 sec., summary > of the whole thing takes 16 sec. A linear regression between two > numeric columns takes < 20 sec. Plotting of all 2.5M points takes a > while, but that is no surprise (and is usually pointless [sic] > anyway). I have not tried to do any compute-intensive statistical > calculations on the whole data set. > > The main (but minor) annoyance with it is that it takes about 90 secs > to load into memory using R's native binary "save" format, so I tend > to keep the process lying around rather than re-starting and > re-loading for each analysis. Fortunately, garbage collection is very > effective in reclaiming unused storage as long as I'm careful to > remove unnecessary objects.
FYI, objects saved with save(..., compress=FALSE) are notable faster to read back. /Henrik > > -s > > > On Wed, Nov 26, 2008 at 7:42 AM, iwalters <[EMAIL PROTECTED]> wrote: >> >> I'm currently working with very large datasets that consist out of 1,000,000 >> + rows. Is it at all possible to use R for datasets this size or should I >> rather consider C++/Java. >> >> >> -- >> View this message in context: >> http://www.nabble.com/increasing-memory-limit-in-Windows-Server-2008-64-bit-tp20675880p20699700.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.