Thank so much for your help and comments. The approach proposed by Jim Holtman was the simplest and fastest. The approach by Marc Schwartz also worked (after a very small modification).
It is clear that a good knowledge of R save a lot of time!! I've been able to do in few minutes a process that was only 1/4th done after 25 h! Many thanks Juli On 02/02/07, juli g. pausas <[EMAIL PROTECTED]> wrote: > Hi all, > I have a large file (1.8 GB) with 900,000 lines that I would like to read. > Each line is a string characters. Specifically I would like to randomly > select 3000 lines. For smaller files, what I'm doing is: > > trs <- scan("myfile", what= character(), sep = "\n") > trs<- trs[sample(length(trs), 3000)] > > And this works OK; however my computer seems not able to handle the 1.8 G > file. > I thought of an alternative way that not require to read the whole file: > > sel <- sample(1:900000, 3000) > for (i in 1:3000) { > un <- scan("myfile", what= character(), sep = "\n", skip=sel[i], nlines=1) > write(un, "myfile_short", append=TRUE) > } > > This works on my computer; however it is extremely slow; it read one line > each time. It is been running for 25 hours and I think it has done less than > half of the file (Yes, probably I do not have a very good computer and I'm > working under Windows ...). > So my question is: do you know any other faster way to do this? > Thanks in advance > > Juli > > -- > http://www.ceam.es/pausas > -- http://www.ceam.es/pausas ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.