This is related to this (closed) issue, but this time I prefer to
discuss the topic on the newsgroup:
http://d.puremagic.com/issues/show_bug.cgi?id=4474
To show why this enhancement request is useful I use a little scripting
task. A D2 program has to read a file that contains one word in each
line, and print all the words with the highest length.
A first naive functional style implementation (if you keep in mind that
File opens files in binary mode on default, this default is different
from the Python one, I am not sure what's the best default. I think
most files I open are text ones, so I think a bit better default is to
open in text mode on default), it doesn't work, it produces an access
violation:
// #1
import std.stdio, std.array, std.algorithm;
void main() {
auto lazyWords = File("words.txt", "r").byLine();
auto maxLen = reduce!max(map!"a.length"(lazyWords));
writeln(filter!((w){ return w.length == maxLen; })(lazyWords));
}
Finding the max length consumes the lazy iterable, so if the text file
is small a possible functional style solution is to convert the lazy
iterable into an array (other functional style solutions are possible,
like reading the file twice, etc):
// #2
import std.stdio, std.array, std.algorithm;
void main() {
auto lazyWords = File("words.txt", "r").byLine();
auto words = array(lazyWords);
auto maxLen = reduce!max(map!"a.length"(words));
writeln(filter!((w){ return w.length == maxLen; })(words));
}
But #2 doesn't work, because while "words" is an array of the words,
byLine() uses the same buffer, so words is a sequence of useless stuff.
To debug that code you need to dup each array item:
// #3
import std.stdio, std.array, std.algorithm;
void main() {
auto lazyWords = File("words.txt", "r").byLine();
auto words = array(map!"a.dup"(lazyWords));
auto maxLen = reduce!max(map!"a.length"(words));
writeln(filter!((w){ return w.length == maxLen; })(words));
}
#3 works, but for scripting-line programs it's not the first version I
write, I have had to debug the code. I think other programmers will
have the same troubles. So I think byLine() default behavour is a bit
bug-prone. Re-using the same buffer gives a good performance boost, but
it's often a premature optimization in scripts. I prefer a language and
library to use unsafe optimizations only on request.
My suggested solution is to split File.byLine() in two member
functions, one returns a lazy iterable that reuses the line buffer and
one that doesn't reuse them. And my preferred solution is to give the
shorter name (so it becomes the "default") to the method that dups the
line, because this is the safer (less bug prone) behaviour. This is
also in accord with D Zen. They may be named byLine() and byFastLine()
or something similar.
If you prefer the "default" one to be the less safe version, then the
safe version may be named byDupLine():
// #4
import std.stdio, std.array, std.algorithm;
void main() {
auto lazyWords = File("words.txt", "r").byDupLine();
auto lines = array(lazyWords);
auto maxLen = reduce!max(map!"a.length"(words));
writeln(filter!((w){ return w.length == maxLen; })(words));
}
Using the "maxs" (from
http://d.puremagic.com/issues/show_bug.cgi?id=4705 ), the code becomes:
// #5
import std.stdio, std.algorithm;
void main() {
auto lazyWords = File("words.txt", "r").byDupLine();
writeln(maxs!"a.length"(lazyWords));
}