On 21/01/16 10:39 PM, data pulverizer wrote:
I have been reading large text files with D's csv file reader and have
found it slow compared to R's read.table function which is not known to
be particularly fast. Here I am reading Fannie Mae mortgage acquisition
data which can be found here
http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html
after registering:
D Code:
import std.algorithm;
import std.array;
import std.file;
import std.csv;
import std.stdio;
import std.typecons;
import std.datetime;
alias row_type = Tuple!(string, string, string, string, string, string,
string, string,
string, string, string, string, string, string,
string, string,
string, string, string, string, string, string);
void main(){
StopWatch sw;
sw.start();
auto buffer = std.file.readText("Acquisition_2009Q2.txt");
auto records = csvReader!row_type(buffer, '|').array;
sw.stop();
double time = sw.peek().msecs;
writeln("Time (s): ", time/1000);
}
Time (s): 13.478
R Code:
system.time(x <- read.table("Acquisition_2009Q2.txt", sep = "|",
colClasses = rep("character", 22)))
user system elapsed
7.810 0.067 7.874
R takes about half as long to read the file. Both read the data in the
"equivalent" type format. Am I doing something incorrect here?
Okay without registering not gonna get that data.
So usual things to think about, did you turn on release mode?
What about inlining?
Lastly how about disabling the GC?
import core.memory : GC;
GC.disable();
dmd -release -inline code.d