Re: [R] R Memory Usage Concerns

2009-09-15 Thread Evan Klitzke
On Mon, Sep 14, 2009 at 10:01 PM, Henrik Bengtsson h...@stat.berkeley.edu wrote: As already suggested, you're (much) better off if you specify colClasses, e.g. tab - read.table(~/20090708.tab, colClasses=c(factor, double, double)); Otherwise, R has to load all the data, make a best guess

Re: [R] R Memory Usage Concerns

2009-09-15 Thread Thomas Lumley
On Tue, 15 Sep 2009, Evan Klitzke wrote: On Mon, Sep 14, 2009 at 10:01 PM, Henrik Bengtsson h...@stat.berkeley.edu wrote: As already suggested, you're (much) better off if you specify colClasses, e.g. tab - read.table(~/20090708.tab, colClasses=c(factor, double, double)); Otherwise, R has

Re: [R] R Memory Usage Concerns

2009-09-15 Thread Carlos J. Gil Bellosta
Hello, I do not know whether my package colbycol may help you. It can help you read files that would not have fitted into memory otherwise. Internally, as the name indicates, data is read into R in a column by column fashion. IO times increase but you need just a fraction of intermediate memory

[R] R Memory Usage Concerns

2009-09-14 Thread Evan Klitzke
Hello all, To start with, these measurements are on Linux with R 2.9.2 (64-bit build) and Python 2.6 (also 64-bit). I've been investigating R for some log file analysis that I've been doing. I'm coming at this from the angle of a programmer whose primarily worked in Python. As I've been playing

Re: [R] R Memory Usage Concerns

2009-09-14 Thread jim holtman
When you read your file into R, show the structure of the object: str(tab) also the size of the object: object.size(tab) This will tell you what your data looks like and the size taken in R. Also in read.table, use colClasses to define what the format of the data is; may make it faster. You

Re: [R] R Memory Usage Concerns

2009-09-14 Thread Eduardo Leoni
And, by the way, factors take up _more_ memory than character vectors. object.size(sample(c(a,b), 1000, replace=TRUE)) 4088 bytes object.size(factor(sample(c(a,b), 1000, replace=TRUE))) 4296 bytes On Mon, Sep 14, 2009 at 11:35 PM, jim holtman jholt...@gmail.com wrote: When you read your file

Re: [R] R Memory Usage Concerns

2009-09-14 Thread Evan Klitzke
On Mon, Sep 14, 2009 at 8:35 PM, jim holtman jholt...@gmail.com wrote: When you read your file into R, show the structure of the object: ... Here's the data I get: tab - read.table(~/20090708.tab) str(tab) 'data.frame': 1797601 obs. of 3 variables: $ V1: Factor w/ 6 levels biz_details,..:

Re: [R] R Memory Usage Concerns

2009-09-14 Thread Evan Klitzke
On Mon, Sep 14, 2009 at 8:58 PM, Eduardo Leoni leoni...@msu.edu wrote: And, by the way, factors take up _more_ memory than character vectors. object.size(sample(c(a,b), 1000, replace=TRUE)) 4088 bytes object.size(factor(sample(c(a,b), 1000, replace=TRUE))) 4296 bytes I think this is just

Re: [R] R Memory Usage Concerns

2009-09-14 Thread hadley wickham
I think this is just because you picked short strings. If the factor is mapping the string to a native integer type, the strings would have to be larger for you to notice: object.size(sample(c(a pretty long string, another pretty long string), 1000, replace=TRUE)) 8184 bytes

Re: [R] R Memory Usage Concerns

2009-09-14 Thread hadley wickham
its 32-bit representation. This seems like it might be too conservative for me, since it implies that R allocated exactly as much memory for the lists as there were numbers in the list (e.g. typically in an interpreter like this you'd be allocating on order-of-two boundaries, i.e. sizeof(obj)

Re: [R] R Memory Usage Concerns

2009-09-14 Thread Henrik Bengtsson
As already suggested, you're (much) better off if you specify colClasses, e.g. tab - read.table(~/20090708.tab, colClasses=c(factor, double, double)); Otherwise, R has to load all the data, make a best guess of the column classes, and then coerce (which requires a copy). /Henrik On Mon, Sep