On Mon, Sep 14, 2009 at 10:01 PM, Henrik Bengtsson h...@stat.berkeley.edu
wrote:
As already suggested, you're (much) better off if you specify colClasses, e.g.
tab - read.table(~/20090708.tab, colClasses=c(factor, double,
double));
Otherwise, R has to load all the data, make a best guess
On Tue, 15 Sep 2009, Evan Klitzke wrote:
On Mon, Sep 14, 2009 at 10:01 PM, Henrik Bengtsson h...@stat.berkeley.edu
wrote:
As already suggested, you're (much) better off if you specify colClasses, e.g.
tab - read.table(~/20090708.tab, colClasses=c(factor, double, double));
Otherwise, R has
Hello,
I do not know whether my package colbycol may help you. It can help
you read files that would not have fitted into memory otherwise.
Internally, as the name indicates, data is read into R in a column by
column fashion.
IO times increase but you need just a fraction of intermediate memory
Hello all,
To start with, these measurements are on Linux with R 2.9.2 (64-bit
build) and Python 2.6 (also 64-bit).
I've been investigating R for some log file analysis that I've been
doing. I'm coming at this from the angle of a programmer whose
primarily worked in Python. As I've been playing
When you read your file into R, show the structure of the object:
str(tab)
also the size of the object:
object.size(tab)
This will tell you what your data looks like and the size taken in R.
Also in read.table, use colClasses to define what the format of the
data is; may make it faster. You
And, by the way, factors take up _more_ memory than character vectors.
object.size(sample(c(a,b), 1000, replace=TRUE))
4088 bytes
object.size(factor(sample(c(a,b), 1000, replace=TRUE)))
4296 bytes
On Mon, Sep 14, 2009 at 11:35 PM, jim holtman jholt...@gmail.com wrote:
When you read your file
On Mon, Sep 14, 2009 at 8:35 PM, jim holtman jholt...@gmail.com wrote:
When you read your file into R, show the structure of the object:
...
Here's the data I get:
tab - read.table(~/20090708.tab)
str(tab)
'data.frame': 1797601 obs. of 3 variables:
$ V1: Factor w/ 6 levels biz_details,..:
On Mon, Sep 14, 2009 at 8:58 PM, Eduardo Leoni leoni...@msu.edu wrote:
And, by the way, factors take up _more_ memory than character vectors.
object.size(sample(c(a,b), 1000, replace=TRUE))
4088 bytes
object.size(factor(sample(c(a,b), 1000, replace=TRUE)))
4296 bytes
I think this is just
I think this is just because you picked short strings. If the factor
is mapping the string to a native integer type, the strings would have
to be larger for you to notice:
object.size(sample(c(a pretty long string, another pretty long string),
1000, replace=TRUE))
8184 bytes
its 32-bit representation. This seems like it might be too
conservative for me, since it implies that R allocated exactly as much
memory for the lists as there were numbers in the list (e.g. typically
in an interpreter like this you'd be allocating on order-of-two
boundaries, i.e. sizeof(obj)
As already suggested, you're (much) better off if you specify colClasses, e.g.
tab - read.table(~/20090708.tab, colClasses=c(factor, double, double));
Otherwise, R has to load all the data, make a best guess of the column
classes, and then coerce (which requires a copy).
/Henrik
On Mon, Sep
11 matches
Mail list logo