On Tuesday, 10 November 2015 at 16:33:02 UTC, Ur@nuz wrote:
I agree with these considerations. When I define non-copyable range (with disabled this) lot of standard phobos functions fails to compile instead of using *save* method. So logical question is in which cases we should use plain old struct copy or and when we should use *save* on forward ranges.

Also good question is should we have input ranges copyable (or for what types of ranges they can be copyable)? Good example is network socket as input range, because we can't save the state of socket stream and get consumed data again so as I thing copying of such range looks meaningless (in my opinion). If we want to pass it somewhere it's better pass it by reference.

Passing by reference really doesn't work with ranges. Consider that most range-based functions are lazy and wrap the range that they're given in a new range. e.g.

auto r = filter!pred(range);

or

auto r = map!func(range);

The range has to be copied for that to work. And even if you could make it so that the result of functions like map or filter referred to the original range by reference, their return value would not be returned by ref, so if a function required that its argument by passed by ref, then you couldn't chain it. So, requiring that ranges be passed by ref would pretty much kill function chaining.

Also passing range somewhere to access it in two different places simultaneously is also bad idea. The current state looks like we have current approach with range postblit constructor and +save+, because we have it for structs and it works somehow (yet) for trivial cases. But we don't have clear intentions about how it should really work.

It's mostly clear, but it isn't necessarily straightforward to get it right. If you want to duplicate a range, then you _must_ use save. Copying a range by assigning it to another range is not actually copying it per the range API. You pretty much have to consider it a move and consider the original unusable after the copy.

The problem is that for arrays and many of the common ranges, copying the range and calling save are semantically the same, so it's very easy to write code which assumes that behavior and then doesn't work with other types of changes. That's why it's critical to test range-based functions with a variety of ranges types - particularly reference types in addition to value types or dynamic arrays.

Copying and passing ranges should also be specifyed as part of range protocol, because it's very common use case and shouldn't be ambigous.

The semantics of copying a range depend heavily on how a range is implemented and cannot be defined in the general case:

auto copy = orig;

Dynamic arrays and classes will function fundamentally differently, and with structs, there are a variety of different semantics that that copy could have. What it ultimately comes down to is that while the range API can require that the copy be in the exact same state that the original was in, it can't say anything about the state of the original after the copy. Well-behaved range-based code has to assume that once orig has been copied, it is unusable. If the code wants to actually get a duplicate of the range, then it will have to use save, and the semantics of that _are_ well-defined and do not depend on the type of the range.

Also as far as range could be class object we must consider how should they behave?

There's really nothing to consider here. It's known how they should behave. There's really only one way that they _can_ behave. One of the main reasons that save exists is because of classes. While copying a dynamic array or many struct types is equivalent to save, it _can't_ be equivalent with a class. When you consider that fact, the required behavior of ranges pretty much falls into place on its own. We may very well need to be far clearer about what those semantics are and how that affects best practices, but there really isn't much (if any) wiggle room in what the range API does and doesn't guarantee and how it should be used. The problem is whether it's _actually_ used that way.

If a range-based function is tested with a variety of range types - dynamic arrays, value types, reference types, etc. then it becomes clear very quickly when calls to save are required and how the function must be written to work for all of those range types. But far too often, range-based functions are tested with dynamic arrays and a few struct range types that wrap dynamic arrays, and bugs with regards to reference type ranges are not found. So, there's almost certainly a lot of range-based code out there that works fantastically with dynamic arrays but would fail miserably with a number of other range types.

For the most part, I think that it's pretty clear how ranges have to act and how they need to be used based on their API when you actually look at how the range API interacts with different types of ranges, but we often do not go much beyond dynamic arrays and miss out on some of the subtleties.

We really do need some good write-ups on ranges and their best practices. I've worked on that before but never managed to spend the time to finish it. Clearly, I need to fix that.

- Jonathan M Davis

Reply via email to