On Tuesday, 24 December 2013 at 22:28:21 UTC, Gordon wrote:
Hello,
I want to load a large text file containing two numeric fields
into an associative array.
The file looks like:
1 40
4 2
42 11
...
And has 11M lines.
My code looks like this:
===
void main()
{
size_t[size_t] unions;
auto f = File("input.txt");
foreach ( line ; f.byLine() ) {
auto fields = line.split();
size_t i = to!size_t(fields[0]);
size_t j = to!size_t(fields[1]);
unions[i] = j; // <-- here be question
}
}
===
This is just a test code to illustrate my question (though
general comments are welcomed - I'm new to D).
Commenting out the highlighted line (not populating the hash),
the program completes in 25 seconds.
Compiling with the highlighted line, the program takes ~3.5
minutes.
Is there a way to speed the loading? perhaps reserving memory
in the hash before populating it? Or another trick?
Many thanks,
-gordon
using OrderedAA improve speed 3x
https://github.com/Kozzi11/Trash/tree/master/util
import util.orderedaa;
int main(string[] args)
{
import std.stdio, std.conv, std.string, core.memory;
import bylinefast;
GC.disable;
OrderedAA!(size_t, size_t, 1_000_007) unions;
//size_t[size_t] unions;
foreach (line; "input.txt".File.byLineFast) {
line.munch(" \t"); // skip ws
immutable i = line.parse!size_t;
line.munch(" \t"); // skip ws
immutable j = line.parse!size_t;
unions[i] = j;
}
GC.enable;
return 0;
}