Re: [R] the large dataset problem

2007-08-04 Thread Adrian Dragulescu
Take a look at the package filehash. It allows you to work with large objects in R (bigger than your RAM) by storing them on the disk. The objects are represented as pointers in R and have a small footprint in memory. You can load all of them in an environment and access them with the $

Re: [R] the large dataset problem

2007-07-31 Thread Peter Dalgaard
(Ted Harding) wrote: On 30-Jul-07 11:40:47, Eric Doviak wrote: [...] Sympathies for the constraints you are operating in! The Introduction to R manual suggests modifying input files with Perl. Any tips on how to get started? Would Perl Data Language (PDL) be a good choice?

Re: [R] the large dataset problem

2007-07-31 Thread Eric Doviak
Just a note of thanks for all the help I have received. I haven't gotten a chance to implement any of your suggestions because I'm still trying to catalog all of them! Thank you so much! Just to recap (for my own benefit and to create a summary for others): Bruce Bernzweig suggested using the

[R] the large dataset problem

2007-07-30 Thread Eric Doviak
Dear useRs, I recently began a job at a very large and heavily bureaucratic organization. We're setting up a research office and statistical analysis will form the backbone of our work. We'll be working with large datasets such the SIPP as well as our own administrative data. Due to the

[R] the large dataset problem

2007-07-30 Thread Eric Doviak
Dear useRs, I recently began a job at a very large and heavily bureaucratic organization. We're setting up a research office and statistical analysis will form the backbone of our work. We'll be working with large datasets such the SIPP as well as our own administrative data. Due to the

Re: [R] the large dataset problem

2007-07-30 Thread Bernzweig, Bruce \(Consultant\)
@stat.math.ethz.ch Subject: [R] the large dataset problem Dear useRs, I recently began a job at a very large and heavily bureaucratic organization. We're setting up a research office and statistical analysis will form the backbone of our work. We'll be working with large datasets such the SIPP as well as our

Re: [R] the large dataset problem

2007-07-30 Thread Ben Bolker
Eric Doviak edoviak at earthlink.net writes: Dear useRs, I recently began a job at a very large and heavily bureaucratic organization. We're setting up a research office and statistical analysis will form the backbone of our work. We'll be working with large datasets such the SIPP as well

Re: [R] the large dataset problem

2007-07-30 Thread Ted Harding
On 30-Jul-07 11:40:47, Eric Doviak wrote: [...] Sympathies for the constraints you are operating in! The Introduction to R manual suggests modifying input files with Perl. Any tips on how to get started? Would Perl Data Language (PDL) be a good choice? http://pdl.perl.org/index_en.html

Re: [R] the large dataset problem

2007-07-30 Thread Roland Rau
Eric Doviak wrote: I need to find some way to overcome these constraints and work with large datasets. Does anyone have any suggestions? I might be not the most authoritative person on this subject but I put all my large datasets[1] into an SQLite database and extract/summarize data from it

Re: [R] the large dataset problem

2007-07-30 Thread Greg Snow
Check out the biglm package for some tools that may be useful. -Original Message- From: Eric Doviak [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch r-help@stat.math.ethz.ch Sent: 7/30/07 9:54 AM Subject: [R] the large dataset problem Dear useRs, I recently began a job at a very large

Re: [R] the large dataset problem

2007-07-30 Thread jim holtman
FYI. I used your script on a Windows machine with 1.5GHZ and using the CYGWIN software that has the UNIX utilities. The field as 1000 lines with 10,000 fields on each line. Here is what it reported: gawk 'BEGIN{FS=,}{print $(1) , $(1000) , $(1275) , $(5678)}' tempxx.txt newdata.csv real