Re: [R] Running out of memory when importing SPSS files
On Wed, Feb 18, 2009 at 9:21 PM, dobomode dobom...@gmail.com wrote: I am trying to import a large dataset from SPSS into R. The SPSS file is in .SAV format and is about 1GB in size. I use read.spss to import the file and get an error saying that I have run out of memory. I am on a MAC OS X 10.5 system with 4GB of RAM. Monitoring the R process tells me that R runs out of memory when reaching about 3GB of RAM so I suppose the remaining 1GB is used up by the OS. (An obviously late and thus unhelpful answer but maybe someone else has a similar problem) I managed to read in a 300 MB file using the following simple function: read.big.spss.file- function(file){ # require (foreign) .Call(do_read_SPSS, file) } The result is a list with some attributes (not a data frame). The idea is that in read.spss, a lot will happen after the data are actually read in. Some difficulties may be introduced at the step when the list is converted to a data frame, and/or when the value labels are attached to the values etc. Once you have the list (but of course, I can't guarantee it will work with a 1GB file), you can manipulate the data (e.g., keeping only a few variables or aggregating some of the cases) before doing further statistics. Maybe it would make sense to add this option to read.spss (i.e. with an extra argument set TRUE, it would just return whatever it gets from do_read_SPSS). Best regards, Kenn Why would a 1GB SPSS file take up more than 3GB of memory in R? Is it perhaps because R is converting each SPSS column to a less memory- efficient data type? In general, what is the best strategy to load large datasets in R? Thanks! P.S. I exported the SPSS .SAV file to .CSV and tried importing the comma delimited file. Same results the import was much slower but eventually I ran out of memory again... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Running out of memory when importing SPSS files
2009/2/19 Thomas Lumley tlum...@u.washington.edu: On Wed, 18 Feb 2009, Uwe Ligges wrote: dobomode wrote: Hello R-help, I am trying to import a large dataset from SPSS into R. The SPSS file is in .SAV format and is about 1GB in size. I use read.spss to import the file and get an error saying that I have run out of memory. I am on a MAC OS X 10.5 system with 4GB of RAM. Monitoring the R process tells me that R runs out of memory when reaching about 3GB of RAM so I suppose the remaining 1GB is used up by the OS. Why would a 1GB SPSS file take up more than 3GB of memory in R? Because SPSS stores data in a compressed way? Or because R uses quite a lot more memory to read a data set than to store it. Either way, even if the data set eventually took up only 1Gb in R you still would probably not be able to work usefully with it on a 32-bit machine. You need to either use a 64-bit system or avoid loading the whole data set. Unfortunately read.spss can't read the data selectively [something I'd like to fix, sometime], but if you had a .csv file you could read a subset of columns or rows using read.table. A better bet is likely to be putting the data set into a database (SQLite is easiest) and reading subsets of the data that way. That's how I handle data sets of a few Gb (on a laptop with 1Gb memory). -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. You could try using package memisc and only bring in the variables you need to analyse. see spss.system.file() and the additional subset() methods in memisc. Paul Bivand - Paul Bivand Head of Analysis and Statistics Inclusion Inclusion has a launched a new website, please visit: www.cesi.org.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Running out of memory when importing SPSS files
Hello R-help, I am trying to import a large dataset from SPSS into R. The SPSS file is in .SAV format and is about 1GB in size. I use read.spss to import the file and get an error saying that I have run out of memory. I am on a MAC OS X 10.5 system with 4GB of RAM. Monitoring the R process tells me that R runs out of memory when reaching about 3GB of RAM so I suppose the remaining 1GB is used up by the OS. Why would a 1GB SPSS file take up more than 3GB of memory in R? Is it perhaps because R is converting each SPSS column to a less memory- efficient data type? In general, what is the best strategy to load large datasets in R? Thanks! P.S. I exported the SPSS .SAV file to .CSV and tried importing the comma delimited file. Same results – the import was much slower but eventually I ran out of memory again... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Running out of memory when importing SPSS files
dobomode wrote: Hello R-help, I am trying to import a large dataset from SPSS into R. The SPSS file is in .SAV format and is about 1GB in size. I use read.spss to import the file and get an error saying that I have run out of memory. I am on a MAC OS X 10.5 system with 4GB of RAM. Monitoring the R process tells me that R runs out of memory when reaching about 3GB of RAM so I suppose the remaining 1GB is used up by the OS. Why would a 1GB SPSS file take up more than 3GB of memory in R? Because SPSS stores data in a compressed way? Is it perhaps because R is converting each SPSS column to a less memory- efficient data type? In general, what is the best strategy to load large datasets in R? Use a 64-bit version of R and have sufficient amount of RAM in your system. Uwe Ligges Thanks! P.S. I exported the SPSS .SAV file to .CSV and tried importing the comma delimited file. Same results – the import was much slower but eventually I ran out of memory again... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.