Take a look at the package filehash. It allows you to work with large
objects in R (bigger than your RAM) by storing them on the disk. The
objects are represented as pointers in R and have a small footprint in
memory. You can load all of them in an environment and access them with the
$
(Ted Harding) wrote:
On 30-Jul-07 11:40:47, Eric Doviak wrote:
[...]
Sympathies for the constraints you are operating in!
The Introduction to R manual suggests modifying input files with
Perl. Any tips on how to get started? Would Perl Data Language (PDL) be
a good choice?
Just a note of thanks for all the help I have received. I haven't gotten a
chance to implement any of your suggestions because I'm still trying to catalog
all of them! Thank you so much!
Just to recap (for my own benefit and to create a summary for others):
Bruce Bernzweig suggested using the
Dear useRs,
I recently began a job at a very large and heavily bureaucratic organization.
We're setting up a research office and statistical analysis will form the
backbone of our work. We'll be working with large datasets such the SIPP as
well as our own administrative data.
Due to the
Dear useRs,
I recently began a job at a very large and heavily bureaucratic organization.
We're setting up a research office and statistical analysis will form the
backbone of our work. We'll be working with large datasets such the SIPP as
well as our own administrative data.
Due to the
@stat.math.ethz.ch
Subject: [R] the large dataset problem
Dear useRs,
I recently began a job at a very large and heavily bureaucratic
organization. We're setting up a research office and statistical
analysis will form the backbone of our work. We'll be working with large
datasets such the SIPP as well as our
Eric Doviak edoviak at earthlink.net writes:
Dear useRs,
I recently began a job at a very large and heavily bureaucratic organization.
We're setting up a research
office and statistical analysis will form the backbone of our work. We'll be
working with large datasets
such the SIPP as well
On 30-Jul-07 11:40:47, Eric Doviak wrote:
[...]
Sympathies for the constraints you are operating in!
The Introduction to R manual suggests modifying input files with
Perl. Any tips on how to get started? Would Perl Data Language (PDL) be
a good choice? http://pdl.perl.org/index_en.html
Eric Doviak wrote:
I need to find some way to overcome these constraints and work with large
datasets. Does anyone have any suggestions?
I might be not the most authoritative person on this subject but I put
all my large datasets[1] into an SQLite database and extract/summarize
data from it
Check out the biglm package for some tools that may be useful.
-Original Message-
From: Eric Doviak [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch r-help@stat.math.ethz.ch
Sent: 7/30/07 9:54 AM
Subject: [R] the large dataset problem
Dear useRs,
I recently began a job at a very large
FYI. I used your script on a Windows machine with 1.5GHZ and using
the CYGWIN software that has the UNIX utilities. The field as 1000
lines with 10,000 fields on each line. Here is what it reported:
gawk 'BEGIN{FS=,}{print $(1) , $(1000) , $(1275) , $(5678)}'
tempxx.txt newdata.csv
real
11 matches
Mail list logo