Re: [R] Running out of memory when importing SPSS files

2011-02-10 Thread Kenn Konstabel
On Wed, Feb 18, 2009 at 9:21 PM, dobomode dobom...@gmail.com wrote:

 I am trying to import a large dataset from SPSS into R. The SPSS file
 is in .SAV format and is about 1GB in size. I use read.spss to import
 the file and get an error saying that I have run out of memory. I am
 on a MAC OS X 10.5 system with 4GB of RAM. Monitoring the R process
 tells me that R runs out of memory when reaching about 3GB of RAM so I
 suppose the remaining 1GB is used up by the OS.


(An obviously late and thus unhelpful answer but maybe someone else has a
similar problem)

I managed to read in a 300 MB file using the following simple function:

read.big.spss.file- function(file){
 # require (foreign)
 .Call(do_read_SPSS, file)
 }

The result is a list with some attributes (not a data frame).

The idea is that in read.spss, a lot will happen after the data are actually
read in. Some difficulties may be introduced at the step when the list is
converted to a data frame, and/or when the value labels are attached to the
values etc. Once you have the list (but of course, I can't guarantee it will
work with a 1GB file), you can manipulate the data (e.g., keeping only a few
variables or aggregating some of the cases) before doing further statistics.

Maybe it would make sense to add this option to read.spss (i.e. with an
extra argument set TRUE, it would just return whatever it gets from
do_read_SPSS).

Best regards,

Kenn


 Why would a 1GB SPSS file take up more than 3GB of memory in R? Is it
 perhaps because R is converting each SPSS column to a less memory-
 efficient data type? In general, what is the best strategy to load
 large datasets in R?

 Thanks!

 P.S.

 I exported the SPSS .SAV file to .CSV and tried importing the comma
 delimited file. Same results – the import was much slower but
 eventually I ran out of memory again...

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Running out of memory when importing SPSS files

2009-02-19 Thread Paul Bivand
2009/2/19 Thomas Lumley tlum...@u.washington.edu:
 On Wed, 18 Feb 2009, Uwe Ligges wrote:

 dobomode wrote:

 Hello R-help,

 I am trying to import a large dataset from SPSS into R. The SPSS file
 is in .SAV format and is about 1GB in size. I use read.spss to import
 the file and get an error saying that I have run out of memory. I am
 on a MAC OS X 10.5 system with 4GB of RAM. Monitoring the R process
 tells me that R runs out of memory when reaching about 3GB of RAM so I
 suppose the remaining 1GB is used up by the OS.

 Why would a 1GB SPSS file take up more than 3GB of memory in R?

 Because SPSS stores data in a compressed way?

 Or because R uses quite a lot more memory to read a data set than to store
 it. Either way, even if the data set eventually took up only 1Gb in R you
 still would probably not be able to work usefully with it on a 32-bit
 machine.

 You need to either use a 64-bit system or avoid loading the whole data set.
  Unfortunately read.spss can't read the data selectively [something I'd like
 to fix, sometime], but if you had a .csv file you could read a subset of
 columns or rows using read.table.

 A better bet is likely to be putting the data set into a database (SQLite is
 easiest) and reading subsets of the data that way.  That's how I handle data
 sets of a few Gb (on a laptop with 1Gb memory).


  -thomas

 Thomas Lumley   Assoc. Professor, Biostatistics
 tlum...@u.washington.eduUniversity of Washington, Seattle

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


You could try using package memisc and only bring in the variables you
need to analyse.

see spss.system.file() and the additional subset() methods in memisc.

Paul Bivand

-
Paul Bivand
Head of Analysis and Statistics
Inclusion

Inclusion has a launched a new website, please visit: www.cesi.org.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Running out of memory when importing SPSS files

2009-02-18 Thread dobomode
Hello R-help,

I am trying to import a large dataset from SPSS into R. The SPSS file
is in .SAV format and is about 1GB in size. I use read.spss to import
the file and get an error saying that I have run out of memory. I am
on a MAC OS X 10.5 system with 4GB of RAM. Monitoring the R process
tells me that R runs out of memory when reaching about 3GB of RAM so I
suppose the remaining 1GB is used up by the OS.

Why would a 1GB SPSS file take up more than 3GB of memory in R? Is it
perhaps because R is converting each SPSS column to a less memory-
efficient data type? In general, what is the best strategy to load
large datasets in R?

Thanks!

P.S.

I exported the SPSS .SAV file to .CSV and tried importing the comma
delimited file. Same results – the import was much slower but
eventually I ran out of memory again...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Running out of memory when importing SPSS files

2009-02-18 Thread Uwe Ligges



dobomode wrote:

Hello R-help,

I am trying to import a large dataset from SPSS into R. The SPSS file
is in .SAV format and is about 1GB in size. I use read.spss to import
the file and get an error saying that I have run out of memory. I am
on a MAC OS X 10.5 system with 4GB of RAM. Monitoring the R process
tells me that R runs out of memory when reaching about 3GB of RAM so I
suppose the remaining 1GB is used up by the OS.

Why would a 1GB SPSS file take up more than 3GB of memory in R? 


Because SPSS stores data in a compressed way?

 Is it

perhaps because R is converting each SPSS column to a less memory-
efficient data type? In general, what is the best strategy to load
large datasets in R?


Use a 64-bit version of R and have sufficient amount of RAM in your system.

Uwe Ligges


Thanks!

P.S.

I exported the SPSS .SAV file to .CSV and tried importing the comma
delimited file. Same results – the import was much slower but
eventually I ran out of memory again...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.