On Saturday, 18 April 2015 at 22:01:56 UTC, Ulrich Küttler wrote:
Input ranges from std.stdio are used for reading files. So
assuming we create a file

    auto f = File("test.txt", "w");
f.writeln(iota(5).map!(a => repeat(to!string(a), 4)).joiner.joiner("\n"));
    f.close();

We should be able groupBy (chunkBy) its lines:

    writeln(File("test.txt").byLine.groupBy!((a,b) => a == b));

The result is just one group, that is all lines are considered equal:

[["0", "0", "0", "0", "1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3", "4", "4", "4", "4"]]

Alas, byLine reuses the same buffer for each line and thus
groupBy keeps comparing each line with itself. There is a version
of byLine that makes copies:

writeln(File("test.txt").byLineCopy.groupBy!((a,b) => a == b));

Indeed, the result is as expected:

[["0", "0", "0", "0"], ["1", "1", "1", "1"], ["2", "2", "2", "2"], ["3", "3", "3", "3"], ["4", "4", "4", "4"]]

Yeah, byLine is dangerous. byLineCopy should probably have been the default. Maybe we should rename byLine to byLineNoCopy (doing the proper deprecation dance, of course).

A final test with the undocumented byRecord method (the mapping
after groupBy is for beauty only and does not change the result):

    writeln(File("test.txt")
            .byRecord!string("%s")
            .groupBy!((a,b) => a == b)
            .map!(map!(a => a[0])));

Here, the result is most peculiar:

[["0", "0", "0", "0"], ["1", "1", "1"], ["2", "2", "2"], ["3", "3", "3"], ["4", "4", "4"]]

Is byRecord broken? (It is undocumented after all.) In a way,
because it does not contain any indirection. The current fields
tuple is a simple member of the ByRecord struct.

In contrast, the ByLineCopy struct is just a wrapper to a ref
counted ByLineCopyImpl struct with a simple note:

        /* Ref-counting stops the source range's ByLineCopyImpl
* from getting out of sync after the range is copied, e.g. * when accessing range.front, then using std.range.take,
         * then accessing range.front again. */

I am uncomfortable at this point. Simple and efficient input
ranges fail in unexpected ways. Internal indirections make all
the difference. It feels like input ranges are hiding something
that should not be hidden.

What am I missing?

I guess the problem is the mix of value and reference semantics. ByRecord's `current` is a value, but its `file` has reference semantics. So, a copy of a ByRecord affects one part of the original but not the other.

Maybe copying should be `@disable`d for such ranges/structs. Then you couldn't pass it by value to groupBy. Instead you would have to use something like (the fixed version of) refRange, which works properly.

Reply via email to