As part of learning D I want to read a file that contains one
word per line (plus optional junk after the word) and creates a
set of all the unique words of a particular length (uppercased).
D doesn't appear to have a set type so I'm faking using an
associative array whose values are always 0.
I can't help feeling that the foreach loop's block is rather more
verbose than it could be?
----
#!/usr/bin/env rdmd
import std.stdio;
immutable WORDFILE = "/usr/share/hunspell/en_GB.dic";
immutable WORDSIZE = 4; // Should be even
alias WordSet = int[string]; // key = word; value = 0
void main() {
import core.time;
auto start = MonoTime.currTime;
auto words = getWords(WORDFILE, WORDSIZE);
// TODO
writeln(words.length, " words");
writeln(MonoTime.currTime - start);
}
WordSet getWords(string filename, int wordsize) {
import std.conv;
import std.regex;
import std.uni;
WordSet words;
auto rx = ctRegex!(r"^[a-z]+", "i");
auto file = File(filename);
foreach (line; file.byLine) {
auto match = matchFirst(line, rx);
if (!match.empty()) {
auto word = match.hit().to!string; // I hope this assumes
UTF-8?
if (word.length == wordsize) {
words[word.toUpper] = 0;
}
}
}
return words;
}
----
PS I'm using ldc on Linux and think that rdmd is excellent. For
lots of small Python programs I have I'm wondering how many would
be faster using D and rdmd (which I think caches binaries). Also
I've now got Mike Parker's "Learning D" on order.