read.table is not very inefficient IF you specify the colClasses= parameter. scan (with the what= parameter) is probably a little more efficient. In either case, save the data using save() once you have it in the right structure and it will be much more efficient to read it next time. (In fact I often exit R at this stage and re-start it with the .RData file before I start the analysis to clear out the memory.)

I did a lot of testing on the types of (large) data structures I normally work with and found that options("save.defaults" = list(compress="bzip2", compression_level=6, ascii=FALSE)) gave me the best trade-off between size and speed. Your mileage will undoubtedly vary, but if you do this a lot it may be worth getting hard data for your setup.

Hope this helps a little.

Allan

On 23/07/10 17:10, babyfoxlo...@sina.com wrote:
 Hi there,

Sorry to bother those who are not interested in this problem.

I'm dealing with a large data set, more than 6 GB file, and doing regression 
test with those data. I was wondering are there any efficient ways to read 
those data? Instead of just using read.table()? BTW, I'm using a 64bit version 
desktop and a 64bit version R, and the memory for the desktop is enough for me 
to use.
Thanks.


--Gin

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to