I have been reading large text files with D's csv file reader and have found it slow compared to R's read.table function which is not known to be particularly fast. Here I am reading Fannie Mae mortgage acquisition data which can be found here http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html after registering:

D Code:

import std.algorithm;
import std.array;
import std.file;
import std.csv;
import std.stdio;
import std.typecons;
import std.datetime;

alias row_type = Tuple!(string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string);

void main(){
  StopWatch sw;
  sw.start();
  auto buffer = std.file.readText("Acquisition_2009Q2.txt");
  auto records = csvReader!row_type(buffer, '|').array;
  sw.stop();
  double time = sw.peek().msecs;
  writeln("Time (s): ", time/1000);
}

Time (s): 13.478

R Code:

system.time(x <- read.table("Acquisition_2009Q2.txt", sep = "|", colClasses = rep("character", 22)))
   user  system elapsed
  7.810   0.067   7.874


R takes about half as long to read the file. Both read the data in the "equivalent" type format. Am I doing something incorrect here?

Reply via email to