On Tuesday, 24 December 2013 at 23:52:49 UTC, Andrei Alexandrescu
wrote:
On 12/24/13 2:28 PM, Gordon wrote:
Hello,
I want to load a large text file containing two numeric fields
into an
associative array.
The file looks like:
1 40
4 2
42 11
...
And has 11M lines.
My code looks like this:
===
void main()
{
size_t[size_t] unions;
auto f = File("input.txt");
foreach ( line ; f.byLine() ) {
auto fields = line.split();
size_t i = to!size_t(fields[0]);
size_t j = to!size_t(fields[1]);
unions[i] = j; // <-- here be question
}
}
===
This is just a test code to illustrate my question (though
general
comments are welcomed - I'm new to D).
Commenting out the highlighted line (not populating the hash),
the
program completes in 25 seconds.
Compiling with the highlighted line, the program takes ~3.5
minutes.
Is there a way to speed the loading? perhaps reserving memory
in the
hash before populating it? Or another trick?
void main()
{
size_t[size_t] unions;
foreach (e; "input.txt"
.slurp!(size_t, size_t)("%s %s").sort.uniq ) {
unions[e[0]] = e[1];
}
}
Andrei
watch out for the parenthsesis on sort. As bearophile likes to
point out frequently, without parenthesis you are calling the
builtin sort, not the std.algorithm one.
Gordon, you may find this has better performance if you add () to
sort.