Hi Hadley,
Hope this is what you are looking for. This approach provides number
of lines in a large 'bzip' file using chunks.
testconn - file(xyzxyz.csv.bz2, open=r)
csize - 1
nolines - 0
while((readnlines - length(readLines(testconn,csize))) 0 ) nolines
- nolines+readnlines
close(testconn)
I was looking for a fast line counter as well a while ago and ended up
writing a small function in R:
countLines() in the R.utils package
At least at the time, it was faster than readLines() [for unknown
reasons]. It is also more memory efficient. It supports connections.
I don't think
It depends on the type of file and your system. 'count.fields()' is
impractical for large files because it generates a matrix with the same
number of dimensions as the file. It would be easier to use scan() with the
delimiter argument set up to read to the end of line marker, \n I believe,
and the
Hi all,
Is there a fast way to determine the number of lines in a file? I'm
looking for something like count.lines analogous to count.fields.
Hadley
--
http://had.co.nz/
__
R-help@r-project.org mailing list
Hi,
parser::nlines does it in C.
Romain
On 02/08/2010 03:16 PM, Hadley Wickham wrote:
Hi all,
Is there a fast way to determine the number of lines in a file? I'm
looking for something like count.lines analogous to count.fields.
Hadley
--
Romain Francois
Professional R Enthusiast
+33(0)
Hadley Wickham hadley at rice.edu writes:
Hi all,
Is there a fast way to determine the number of lines in a file? I'm
looking for something like count.lines analogous to count.fields.
Hadley
How about something like
length(readLines(fname))
Ken
If you are willing to use an external program parse the result of:
system(wc -l small.dat)
10 small.dat
On Windows there is a wc.exe program in the Rtools distribution.
On Mon, Feb 8, 2010 at 9:16 AM, Hadley Wickham had...@rice.edu wrote:
Hi all,
Is there a fast way to determine the number
parser::nlines does it in C.
Looks promising, but I need something that uses connections because
I'm working with big bzipped files.
Hadley
--
http://had.co.nz/
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
Hi Ken,
How about something like
length(readLines(fname))
I'm trying to avoid the overhead of reading the file in twice. (I'm
trying to preallocate a data structure for a chunked read)
Hadley
--
http://had.co.nz/
__
R-help@r-project.org mailing
On 02/08/2010 04:16 PM, Hadley Wickham wrote:
parser::nlines does it in C.
Looks promising, but I need something that uses connections because
I'm working with big bzipped files.
Hadley
Ah... the lack of c-level api for connections again ;-)
--
Romain Francois
Professional R Enthusiast
I was looking for a fast line counter as well a while ago and ended up
writing a small function in R:
countLines() in the R.utils package
At least at the time, it was faster than readLines() [for unknown
reasons]. It is also more memory efficient. It supports connections.
I don't think it
11 matches
Mail list logo