Reading a file of words line by line

mark via Digitalmars-d-learn Tue, 14 Jan 2020 08:41:12 -0800

As part of learning D I want to read a file that contains oneword per line (plus optional junk after the word) and creates aset of all the unique words of a particular length (uppercased).

D doesn't appear to have a set type so I'm faking using anassociative array whose values are always 0.

I can't help feeling that the foreach loop's block is rather moreverbose than it could be?


----
#!/usr/bin/env rdmd
import std.stdio;

immutable WORDFILE = "/usr/share/hunspell/en_GB.dic";
immutable WORDSIZE = 4; // Should be even

alias WordSet = int[string]; // key = word; value = 0

void main() {
    import core.time;

    auto start = MonoTime.currTime;
    auto words = getWords(WORDFILE, WORDSIZE);
    // TODO
    writeln(words.length, " words");
    writeln(MonoTime.currTime - start);
}

WordSet getWords(string filename, int wordsize) {
    import std.conv;
    import std.regex;
    import std.uni;

    WordSet words;
    auto rx = ctRegex!(r"^[a-z]+", "i");
    auto file = File(filename);
    foreach (line; file.byLine) {
        auto match = matchFirst(line, rx);
        if (!match.empty()) {

auto word = match.hit().to!string; // I hope this assumesUTF-8?

            if (word.length == wordsize) {
                words[word.toUpper] = 0;
            }
        }
    }
    return words;
}
----

PS I'm using ldc on Linux and think that rdmd is excellent. Forlots of small Python programs I have I'm wondering how many wouldbe faster using D and rdmd (which I think caches binaries). AlsoI've now got Mike Parker's "Learning D" on order.

Reading a file of words line by line

Reply via email to