As part of learning D I want to read a file that contains one word per line (plus optional junk after the word) and creates a set of all the unique words of a particular length (uppercased).

D doesn't appear to have a set type so I'm faking using an associative array whose values are always 0.

I can't help feeling that the foreach loop's block is rather more verbose than it could be?

----
#!/usr/bin/env rdmd
import std.stdio;

immutable WORDFILE = "/usr/share/hunspell/en_GB.dic";
immutable WORDSIZE = 4; // Should be even

alias WordSet = int[string]; // key = word; value = 0

void main() {
    import core.time;

    auto start = MonoTime.currTime;
    auto words = getWords(WORDFILE, WORDSIZE);
    // TODO
    writeln(words.length, " words");
    writeln(MonoTime.currTime - start);
}

WordSet getWords(string filename, int wordsize) {
    import std.conv;
    import std.regex;
    import std.uni;

    WordSet words;
    auto rx = ctRegex!(r"^[a-z]+", "i");
    auto file = File(filename);
    foreach (line; file.byLine) {
        auto match = matchFirst(line, rx);
        if (!match.empty()) {
auto word = match.hit().to!string; // I hope this assumes UTF-8?
            if (word.length == wordsize) {
                words[word.toUpper] = 0;
            }
        }
    }
    return words;
}
----

PS I'm using ldc on Linux and think that rdmd is excellent. For lots of small Python programs I have I'm wondering how many would be faster using D and rdmd (which I think caches binaries). Also I've now got Mike Parker's "Learning D" on order.

Reply via email to