[R] Reading a large csv file row by row

2007-04-06 Thread Yuchen Luo
Hi, my friends.

When a data file is large, loading the whole file into the memory all
together is not feasible. A feasible way  is to read one row, process it,
store the result, and read the next row.

In Fortran, by default, the 'read' command reads one line of a file, which
is convenient, and when the same 'read' command is executed the next time,
the next row of the same file will be read.

I tried to replicate such row-by-row reading in R.I use scan( ) to do so
with the skip= xxx  option. It takes only seconds when the number of the
rows is within 1000. However, it takes hours to read 1 rows. I think it
is because every time R reads, it needs to start from the first row of the
file and count xxx rows to find the row it needs to read. Therefore, it
takes more time for R to locate the row it needs to read.

Is there a solution to this problem?

Your help will be highly appreciated!
Best Wishes
 Yuchen Luo

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading a large csv file row by row

2007-04-06 Thread Prof Brian Ripley
The solution is to read the 'R Data Import/Export Manual' and make use of 
connections or databases.

What you want to do is very easy in RODBC, for example, but can be done 
with scan() easily provided you keep a connection open.

On Fri, 6 Apr 2007, Yuchen Luo wrote:

 Hi, my friends.

 When a data file is large, loading the whole file into the memory all
 together is not feasible. A feasible way  is to read one row, process it,
 store the result, and read the next row.

It makes a lot more sense to process say 1000 rows at a time.

 In Fortran, by default, the 'read' command reads one line of a file, which
 is convenient, and when the same 'read' command is executed the next time,
 the next row of the same file will be read.

 I tried to replicate such row-by-row reading in R.I use scan( ) to do so
 with the skip= xxx  option. It takes only seconds when the number of the
 rows is within 1000. However, it takes hours to read 1 rows. I think it
 is because every time R reads, it needs to start from the first row of the
 file and count xxx rows to find the row it needs to read. Therefore, it
 takes more time for R to locate the row it needs to read.

Yes, R does tend to do what you tell it to 

 Is there a solution to this problem?

 Your help will be highly appreciated!
 Best Wishes
 Yuchen Luo

   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

PLEASE do as we ask.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.