Re: [R] Fast way to determine number of lines in a file

2010-02-10 Thread Indian_R_Analyst
Hi Hadley,

Hope this is what you are looking for. This approach provides number
of lines in a large 'bzip' file using chunks.

testconn - file(xyzxyz.csv.bz2, open=r)
csize - 1
nolines - 0
while((readnlines - length(readLines(testconn,csize))) 0 ) nolines
- nolines+readnlines
close(testconn)
nolines

Regards,
Indian_R_Analyst.


On Feb 8, 7:16 pm, Hadley Wickham had...@rice.edu wrote:
 Hi all,

 Is there afastwayto determine the number of lines in a file?  I'm
 looking for something like count.lines analogous to count.fields.

 Hadley

 --http://had.co.nz/

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast way to determine number of lines in a file

2010-02-09 Thread hadley wickham
 I was looking for a fast line counter as well a while ago and ended up
 writing a small function in R:

  countLines() in the R.utils package

 At least at the time, it was faster than readLines() [for unknown
 reasons].  It is also more memory efficient.  It supports connections.
  I don't think it beats a system call to 'wc', though.  When there
 will be a faster solution available, it'll be calling that instead.
 It does not avoid reading the file twice.

Perfect - exactly what I was looking for.

Thanks!

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast way to determine number of lines in a file

2010-02-09 Thread kMan
It depends on the type of file and your system. 'count.fields()' is
impractical for large files because it generates a matrix with the same
number of dimensions as the file. It would be easier to use scan() with the
delimiter argument set up to read to the end of line marker, \n I believe,
and the 'what' argument set to a null list, so nothing is actually read.
Scan will still report the number of lines read. 

For flat files, and in windows, additional utilities installed with RTOOLS
(just need the tools-Cygwin dlls install) are the fastest that I know of. 

if(.Platform$OS.type==windows){ 
  system.time({ 
cmd-system(paste(/RTools/bin/wc -l,much_data.bin), intern=TRUE) 
cmd-strsplit(cmd,  )[[1]][1] 
}) 
 }

Sincerely,
KeithC.

-Original Message-
From: Hadley Wickham [mailto:had...@rice.edu] 
Sent: Monday, February 08, 2010 7:16 AM
To: R-help
Subject: [R] Fast way to determine number of lines in a file

Hi all,

Is there a fast way to determine the number of lines in a file?  I'm looking
for something like count.lines analogous to count.fields.

Hadley

--
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast way to determine number of lines in a file

2010-02-08 Thread Romain Francois

Hi,

parser::nlines does it in C.

Romain

On 02/08/2010 03:16 PM, Hadley Wickham wrote:


Hi all,

Is there a fast way to determine the number of lines in a file?  I'm
looking for something like count.lines analogous to count.fields.

Hadley


--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://tr.im/MPYc : RProtoBuf: protocol buffers for R
|- http://tr.im/KfKn : Rcpp 0.7.2
`- http://tr.im/JOlc : External pointers with Rcpp

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast way to determine number of lines in a file

2010-02-08 Thread Ken Knoblauch
Hadley Wickham hadley at rice.edu writes:

 
 Hi all,
 
 Is there a fast way to determine the number of lines in a file?  I'm
 looking for something like count.lines analogous to count.fields.
 
 Hadley
How about something like
length(readLines(fname))

Ken

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast way to determine number of lines in a file

2010-02-08 Thread Gabor Grothendieck
If you are willing to use an external program parse the result of:

 system(wc -l small.dat)
10 small.dat

On Windows there is a wc.exe program in the Rtools distribution.

On Mon, Feb 8, 2010 at 9:16 AM, Hadley Wickham had...@rice.edu wrote:
 Hi all,

 Is there a fast way to determine the number of lines in a file?  I'm
 looking for something like count.lines analogous to count.fields.

 Hadley

 --
 http://had.co.nz/

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast way to determine number of lines in a file

2010-02-08 Thread Hadley Wickham
 parser::nlines does it in C.

Looks promising, but I need something that uses connections because
I'm working with big bzipped files.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast way to determine number of lines in a file

2010-02-08 Thread hadley wickham
Hi Ken,

 How about something like
 length(readLines(fname))

I'm trying to avoid the overhead of reading the file in twice.  (I'm
trying to preallocate a data structure for a chunked read)

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast way to determine number of lines in a file

2010-02-08 Thread Romain Francois

On 02/08/2010 04:16 PM, Hadley Wickham wrote:



parser::nlines does it in C.


Looks promising, but I need something that uses connections because
I'm working with big bzipped files.

Hadley


Ah... the lack of c-level api for connections again ;-)

--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://tr.im/MPYc : RProtoBuf: protocol buffers for R
|- http://tr.im/KfKn : Rcpp 0.7.2
`- http://tr.im/JOlc : External pointers with Rcpp

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast way to determine number of lines in a file

2010-02-08 Thread Henrik Bengtsson
I was looking for a fast line counter as well a while ago and ended up
writing a small function in R:

  countLines() in the R.utils package

At least at the time, it was faster than readLines() [for unknown
reasons].  It is also more memory efficient.  It supports connections.
 I don't think it beats a system call to 'wc', though.  When there
will be a faster solution available, it'll be calling that instead.
It does not avoid reading the file twice.

/Henrik

On Mon, Feb 8, 2010 at 4:17 PM, hadley wickham h.wick...@gmail.com wrote:
 Hi Ken,

 How about something like
 length(readLines(fname))

 I'm trying to avoid the overhead of reading the file in twice.  (I'm
 trying to preallocate a data structure for a chunked read)

 Hadley


 --
 http://had.co.nz/

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.