On Tuesday, 24 December 2013 at 22:28:21 UTC, Gordon wrote:
Hello,

I want to load a large text file containing two numeric fields into an associative array.
The file looks like:
   1   40
   4   2
   42  11
   ...

And has 11M lines.

My code looks like this:
===
void main()
{
        size_t[size_t] unions;
        auto f = File("input.txt");
        foreach ( line ; f.byLine() ) {
                auto fields = line.split();
                size_t i = to!size_t(fields[0]);
                size_t j = to!size_t(fields[1]);
                unions[i] = j; // <-- here be question
        }
}
===

This is just a test code to illustrate my question (though general comments are welcomed - I'm new to D).

Commenting out the highlighted line (not populating the hash), the program completes in 25 seconds. Compiling with the highlighted line, the program takes ~3.5 minutes.

Is there a way to speed the loading? perhaps reserving memory in the hash before populating it? Or another trick?

Many thanks,
 -gordon

using OrderedAA improve speed 3x
https://github.com/Kozzi11/Trash/tree/master/util

import util.orderedaa;

int main(string[] args)
{
    import std.stdio, std.conv, std.string, core.memory;
    import bylinefast;

    GC.disable;
    OrderedAA!(size_t, size_t, 1_000_007) unions;
    //size_t[size_t] unions;
    foreach (line; "input.txt".File.byLineFast) {
        line.munch(" \t"); // skip ws
        immutable i = line.parse!size_t;
        line.munch(" \t"); // skip ws
        immutable j = line.parse!size_t;
        unions[i] = j;
    }
    GC.enable;
        
        return 0;
}

Reply via email to