Re: [R] reading very large files

juli g. pausas Sun, 04 Feb 2007 10:33:58 -0800

Thank so much for your help and comments.
The approach proposed by Jim Holtman was the simplest and fastest. The
approach by Marc Schwartz also worked (after a very small
modification).


It is clear that a good knowledge of R save a lot of time!! I've been
able to do in few minutes a process that was only 1/4th done after 25
h!

Many thanks

Juli


On 02/02/07, juli g. pausas <[EMAIL PROTECTED]> wrote:
> Hi all,
> I have a large file (1.8 GB) with 900,000 lines that I would like to read.
> Each line is a string characters. Specifically I would like to randomly
> select 3000 lines. For smaller files, what I'm doing is:
>
> trs <- scan("myfile", what= character(), sep = "\n")
>  trs<- trs[sample(length(trs), 3000)]
>
> And this works OK; however my computer seems not able to handle the 1.8 G
> file.
> I thought of an alternative way that not require to read the whole file:
>
> sel <- sample(1:900000, 3000)
> for (i in 1:3000)  {
> un <- scan("myfile", what= character(), sep = "\n", skip=sel[i], nlines=1)
>  write(un, "myfile_short", append=TRUE)
> }
>
> This works on my computer; however it is extremely slow; it read one line
> each time. It is been running for 25 hours and I think it has done less than
> half of the file (Yes, probably I do not have a very good computer and I'm
> working under Windows ...).
> So my question is: do you know any other faster way to do this?
> Thanks in advance
>
> Juli
>
> --
>  http://www.ceam.es/pausas
>


-- 
http://www.ceam.es/pausas

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reading very large files

Reply via email to