Re: [R] Fast way to determine number of lines in a file
Hi Hadley, Hope this is what you are looking for. This approach provides number of lines in a large 'bzip' file using chunks. testconn - file(xyzxyz.csv.bz2, open=r) csize - 1 nolines - 0 while((readnlines - length(readLines(testconn,csize))) 0 ) nolines - nolines+readnlines close(testconn) nolines Regards, Indian_R_Analyst. On Feb 8, 7:16 pm, Hadley Wickham had...@rice.edu wrote: Hi all, Is there afastwayto determine the number of lines in a file? I'm looking for something like count.lines analogous to count.fields. Hadley --http://had.co.nz/ __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fast way to determine number of lines in a file
I was looking for a fast line counter as well a while ago and ended up writing a small function in R: countLines() in the R.utils package At least at the time, it was faster than readLines() [for unknown reasons]. It is also more memory efficient. It supports connections. I don't think it beats a system call to 'wc', though. When there will be a faster solution available, it'll be calling that instead. It does not avoid reading the file twice. Perfect - exactly what I was looking for. Thanks! Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fast way to determine number of lines in a file
It depends on the type of file and your system. 'count.fields()' is impractical for large files because it generates a matrix with the same number of dimensions as the file. It would be easier to use scan() with the delimiter argument set up to read to the end of line marker, \n I believe, and the 'what' argument set to a null list, so nothing is actually read. Scan will still report the number of lines read. For flat files, and in windows, additional utilities installed with RTOOLS (just need the tools-Cygwin dlls install) are the fastest that I know of. if(.Platform$OS.type==windows){ system.time({ cmd-system(paste(/RTools/bin/wc -l,much_data.bin), intern=TRUE) cmd-strsplit(cmd, )[[1]][1] }) } Sincerely, KeithC. -Original Message- From: Hadley Wickham [mailto:had...@rice.edu] Sent: Monday, February 08, 2010 7:16 AM To: R-help Subject: [R] Fast way to determine number of lines in a file Hi all, Is there a fast way to determine the number of lines in a file? I'm looking for something like count.lines analogous to count.fields. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fast way to determine number of lines in a file
Hi, parser::nlines does it in C. Romain On 02/08/2010 03:16 PM, Hadley Wickham wrote: Hi all, Is there a fast way to determine the number of lines in a file? I'm looking for something like count.lines analogous to count.fields. Hadley -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://tr.im/MPYc : RProtoBuf: protocol buffers for R |- http://tr.im/KfKn : Rcpp 0.7.2 `- http://tr.im/JOlc : External pointers with Rcpp __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fast way to determine number of lines in a file
Hadley Wickham hadley at rice.edu writes: Hi all, Is there a fast way to determine the number of lines in a file? I'm looking for something like count.lines analogous to count.fields. Hadley How about something like length(readLines(fname)) Ken __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fast way to determine number of lines in a file
If you are willing to use an external program parse the result of: system(wc -l small.dat) 10 small.dat On Windows there is a wc.exe program in the Rtools distribution. On Mon, Feb 8, 2010 at 9:16 AM, Hadley Wickham had...@rice.edu wrote: Hi all, Is there a fast way to determine the number of lines in a file? I'm looking for something like count.lines analogous to count.fields. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fast way to determine number of lines in a file
parser::nlines does it in C. Looks promising, but I need something that uses connections because I'm working with big bzipped files. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fast way to determine number of lines in a file
Hi Ken, How about something like length(readLines(fname)) I'm trying to avoid the overhead of reading the file in twice. (I'm trying to preallocate a data structure for a chunked read) Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fast way to determine number of lines in a file
On 02/08/2010 04:16 PM, Hadley Wickham wrote: parser::nlines does it in C. Looks promising, but I need something that uses connections because I'm working with big bzipped files. Hadley Ah... the lack of c-level api for connections again ;-) -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://tr.im/MPYc : RProtoBuf: protocol buffers for R |- http://tr.im/KfKn : Rcpp 0.7.2 `- http://tr.im/JOlc : External pointers with Rcpp __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fast way to determine number of lines in a file
I was looking for a fast line counter as well a while ago and ended up writing a small function in R: countLines() in the R.utils package At least at the time, it was faster than readLines() [for unknown reasons]. It is also more memory efficient. It supports connections. I don't think it beats a system call to 'wc', though. When there will be a faster solution available, it'll be calling that instead. It does not avoid reading the file twice. /Henrik On Mon, Feb 8, 2010 at 4:17 PM, hadley wickham h.wick...@gmail.com wrote: Hi Ken, How about something like length(readLines(fname)) I'm trying to avoid the overhead of reading the file in twice. (I'm trying to preallocate a data structure for a chunked read) Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.