On Saturday, 18 April 2015 at 22:01:56 UTC, Ulrich Küttler wrote:
Input ranges from std.stdio are used for reading files. So
assuming we create a file
auto f = File("test.txt", "w");
f.writeln(iota(5).map!(a => repeat(to!string(a),
4)).joiner.joiner("\n"));
f.close();
We should be able groupBy (chunkBy) its lines:
writeln(File("test.txt").byLine.groupBy!((a,b) => a == b));
The result is just one group, that is all lines are considered
equal:
[["0", "0", "0", "0", "1", "1", "1", "1", "2", "2", "2",
"2", "3", "3", "3", "3", "4", "4", "4", "4"]]
Alas, byLine reuses the same buffer for each line and thus
groupBy keeps comparing each line with itself. There is a
version
of byLine that makes copies:
writeln(File("test.txt").byLineCopy.groupBy!((a,b) => a ==
b));
Indeed, the result is as expected:
[["0", "0", "0", "0"], ["1", "1", "1", "1"], ["2", "2",
"2", "2"], ["3", "3", "3", "3"], ["4", "4", "4", "4"]]
Yeah, byLine is dangerous. byLineCopy should probably have been
the default. Maybe we should rename byLine to byLineNoCopy (doing
the proper deprecation dance, of course).
A final test with the undocumented byRecord method (the mapping
after groupBy is for beauty only and does not change the
result):
writeln(File("test.txt")
.byRecord!string("%s")
.groupBy!((a,b) => a == b)
.map!(map!(a => a[0])));
Here, the result is most peculiar:
[["0", "0", "0", "0"], ["1", "1", "1"], ["2", "2", "2"],
["3", "3", "3"], ["4", "4", "4"]]
Is byRecord broken? (It is undocumented after all.) In a way,
because it does not contain any indirection. The current fields
tuple is a simple member of the ByRecord struct.
In contrast, the ByLineCopy struct is just a wrapper to a ref
counted ByLineCopyImpl struct with a simple note:
/* Ref-counting stops the source range's ByLineCopyImpl
* from getting out of sync after the range is copied,
e.g.
* when accessing range.front, then using
std.range.take,
* then accessing range.front again. */
I am uncomfortable at this point. Simple and efficient input
ranges fail in unexpected ways. Internal indirections make all
the difference. It feels like input ranges are hiding something
that should not be hidden.
What am I missing?
I guess the problem is the mix of value and reference semantics.
ByRecord's `current` is a value, but its `file` has reference
semantics. So, a copy of a ByRecord affects one part of the
original but not the other.
Maybe copying should be `@disable`d for such ranges/structs. Then
you couldn't pass it by value to groupBy. Instead you would have
to use something like (the fixed version of) refRange, which
works properly.