Hi,

On Jul 14, 2009, at 1:53 PM, giusto wrote:


Hi all,

I am having problems importing a VERY large dataset in R. I have looked into the package ff, and that seems to suit me, but also, from all the examples I have seen, it either requires a manual creation of the database, or it needs a read.table kind of step. Being a survey kind of data the file is big (like 20,000 times 50,000 for a total of about 1.2Gb in plain text) the memory I have isn't enough to do a read.table and my computer freezes every time :(

Look at the documentation near the end of ?read.table:

"""Note that unless colClasses is specified, all columns are read as character columns and then converted. This means that quotes are interpreted in all fields and that a column of values like "42" will result in an integer column."""

So all the data is read in as characters, then R tries to convert it to the appropriate data type (uses mucho memory).

Perhaps specifying the types of each column in the colClasses param can get you where you need to be.

This far I have managed to import the required subset of the data by using a "cheat": I used GRETL to read an equivalent Stata file (released by the same source that offered the csv file), manipulate it and export it in a format
that R can read into memory.

I'm not sure if you're suggesting that R can read in the whole data file when stored in some SPSS binary format. If so, perhaps the colClass trick above might work.

If the read.table w/ colClasses doesn't work (and you know you can load the entire dataset into R via some binary format), perhaps you'll have to parse the file line by line by opening it with a "file(.., 'r')" command, and using "scan" (or readChar?) to run through the file w/o having to load it all into memory at once.

HTH,
-steve

--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University

Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to