Re: [R] How to deal with more than 6GB dataset using R?

Allan Engelhardt Fri, 23 Jul 2010 09:40:49 -0700

read.table is not very inefficient IF you specify the colClasses=parameter. scan (with the what= parameter) is probably a little moreefficient. In either case, save the data using save() once you have itin the right structure and it will be much more efficient to read itnext time. (In fact I often exit R at this stage and re-start it withthe .RData file before I start the analysis to clear out the memory.)

I did a lot of testing on the types of (large) data structures Inormally work with and found that options("save.defaults" =list(compress="bzip2", compression_level=6, ascii=FALSE)) gave me thebest trade-off between size and speed. Your mileage will undoubtedlyvary, but if you do this a lot it may be worth getting hard data foryour setup.


Hope this helps a little.

Allan

On 23/07/10 17:10, babyfoxlo...@sina.com wrote:

&nbsp;Hi there,

Sorry to bother those who are not interested in this problem.

I'm dealing with a large data set, more than 6 GB file, and doing regression 
test with those data. I was wondering are there any efficient ways to read 
those data? Instead of just using read.table()? BTW, I'm using a 64bit version 
desktop and a 64bit version R, and the memory for the desktop is enough for me 
to use.
Thanks.


--Gin

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to deal with more than 6GB dataset using R?

Reply via email to