One option for processing very large files with R is split: ## split a large file into pieces #--parameters: the folder, file and number of parts FLD=/home/user/data F=very_large_file.dat parts=50 #---split cd $FLD fn=`echo $F | awk -F\. '{print $1}'` #file name without extension ln=`wc -l $F | awk '{print $1}'` #number of lines in the file forsplit=`expr $ln / $parts + 1` #number of lines in each part echo "====== $F will be split in $parts parts of $forsplit lines each." split -l $forsplit $F $fn You could also load the entire file into a DBMS then pull parts of it into R, or read specific lines through a pipe e.g. readLines(pipe("sed, grep, python... command")).
Don't try to replicate the SAS processing into R. The exact translations of the SAS DATA STEP usage of _N_, first., last., retain etc into R would be: inefficient, ugly, retrogressive, wrong, rigid, complicated, silly and so on. For a start, read up on indexing - this seemingly simple and innocuous R feature is in fact far more powerful than the entire DATA STEP with its whole bag of tricks. Then search the list for similar questions, for example http://thread.gmane.org/gmane.comp.lang.r.general/44332/focus=44343 > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Gerard Smits > Sent: Sunday, January 21, 2007 2:22 PM > To: r-help@stat.math.ethz.ch > Subject: [R] sequential processing > > Like many others, I am new to R but old to SAS. > > Am I correct in understanding that R processes a data frame in a > sequential ly? This would imply that large input files could be > read, without the need to load the entire file into memory. > Related to the manner of reading a frame, I have been looking for the > equivalent of SAS _n_ (I realize that I can use a variant of which to > identify an index value) as well as useful SAS features such as > first., last., retain, etc. Any help with this conversion > appreciated. > > Thanks, > > Gerard Smits > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.