On Tuesday, 24 December 2013 at 23:52:49 UTC, Andrei Alexandrescu wrote:
On 12/24/13 2:28 PM, Gordon wrote:
Hello,

I want to load a large text file containing two numeric fields into an
associative array.
The file looks like:
   1   40
   4   2
   42  11
   ...

And has 11M lines.

My code looks like this:
===
void main()
{
        size_t[size_t] unions;
        auto f = File("input.txt");
        foreach ( line ; f.byLine() ) {
                auto fields = line.split();
                size_t i = to!size_t(fields[0]);
                size_t j = to!size_t(fields[1]);
                unions[i] = j; // <-- here be question
        }
}
===

This is just a test code to illustrate my question (though general
comments are welcomed - I'm new to D).

Commenting out the highlighted line (not populating the hash), the
program completes in 25 seconds.
Compiling with the highlighted line, the program takes ~3.5 minutes.

Is there a way to speed the loading? perhaps reserving memory in the
hash before populating it? Or another trick?

void main()
{
    size_t[size_t] unions;
    foreach (e; "input.txt"
            .slurp!(size_t, size_t)("%s %s").sort.uniq ) {
        unions[e[0]] = e[1];
    }
}


Andrei

watch out for the parenthsesis on sort. As bearophile likes to point out frequently, without parenthesis you are calling the builtin sort, not the std.algorithm one.


Gordon, you may find this has better performance if you add () to sort.

Reply via email to