On Friday, November 25, 2016 08:32:07 Joseph Rushton Wakeling via Digitalmars-d wrote: > On Wednesday, 23 November 2016 at 21:19:51 UTC, Jonathan M Davis > > wrote: > > It's quite feasible if you don't calculate front until it's > > called or popFront is called, and then you cache the result so > > that subsequent calls to front prior to the call to popFront > > return the same result, but it creates additional overhead, > > because then every call to front and popFront has to check > > whether the value has been calculated yet. And if every range > > in the chain of ranges has to do that, then that additional > > overhead is going to add up. > > Yes, indeed. I actually came up with a general purpose solution > for that a while back via template mixin (although the version in > the post linked to is not the final version: I updated it to use > HaraldZealot's suggestion of making the frontImplementation and > popFrontImplementation the template parameters): > https://forum.dlang.org/post/mailman.263.1439926667.13986.digitalmars-d@pu > remagic.com > > In general, it's just cleaner to either calculate front on > > every call to front (which doesn't make sense in the case of > > random number generators) or to calculate the first front upon > > construction and be done with it. > > In some ways I think it's cleaner to not privilege the first > front value and go for "true" laziness for all front values -- > particularly when dealing with stuff like RandomSample.
It's much messier from an implementation standpoint. If the constructor checks for empty and caches front if the range it was given was non-empty, then every call to front can just return front, and every call to popFront can just pop the element off and do whatever it does to turn the front from the wrapped range into the front of this range and then cache it. If the constructor doesn't do that work, then every call to front and every call to popFront have to check whether they're the first time that either has been called and _then_ they end up doing all of the same work anyway. So, there's more code, and it's less efficient. And if every range in the chain does that, then you're getting a lot of extra checks just to determine whether it's the first time that front or popFront is being called. Now, if there's never any caching of the front value, and front always calculates it, then that might actually be more efficient if front is only ever called once per element, because you don't get the cost of copying the element to cache it, but if front gets called multiple times (which happens fairly often, I think), then the cost is definitely greater. You're never going to be able to rely on a fully lazy front anyway unless we specified it as part of the range API that all ranges work that way, and as it stands, very few ranges do. > The downside is that `.front` becomes non-const. Well, realistically, as it stands, ranges are utterly useless with const anyway. We'd have to have a standard mechanism for getting a tail-const copy of a range from a const or immutable range, and we don't. > Sometimes I > wonder whether it wouldn't have been better to require that > `popFront()` be called before even the first call to `.front`, > and place the onus on `popFront()` to handle any special-casing > of the first `.front` value. That would be terrible with chaining ranges. It also would be _very_ error-prone. It's also heading into two-phase initialization territory, which is almost always a bad idea. > > And I think that the general consensus has been that > > calculating front in the constructor and popFront and caching > > works better than calculating it on every call to front, but > > that doesn't always work (e.g. map), and that caching does > > cause issues occasionally. > > The case I always think of is: what if your input range is > designed to correspond to sensor observations made at a > particular time? This is a case where cacheing the first value > on construction is very problematic. > > For RNGs it's really not such a big deal so long as successive > variates are independent, but for random algorithms I think the > laziness matters, since their popFront is essentially > IO-dependent (in the Haskell-style sense that randomness is IO). Honestly, I don't think that it really works to have a range where the value of its elements depend on when you call front or popFront. The API lets you do it, but you're going to run into plenty of problems if you do unless you don't care about the frequency at which the values are generated. Part of the problem here is that ranges and the algorithms that use them are really designed with arrays in mind. A non-random-access range reduces those capabilities, and a basic input range doesn't actually require that the elements be reproducible aside from multiple calls to front giving the same result, but it's still the case that it's pretty much assumed that a range is a fixed set of values, and pretty much nothing is concerned with how fast front or popFront is called aside for efficiency concerns. So, if someone has a range that calculates a value when front or popFront is called, and that value depends on when the function is called, I think that they're just going to have to deal with the consequences of that. It's already pretty problematic when you have a range that's a basic input range rather than a forward range. Another thing to consider here is that those sorts of concerns really only apply to basic input ranges. As soon as you have a forward range, the values of the elements need to be completely predictable - or at least completely reproducible - meaning that aside from caching a potentially infinite number of elements to guarantee that save works, nothing that's doing something like grabbing changing values from a sensor is going to be anything but a basic input range. So, maybe we should do something special with regards to basic input ranges and how they're treated in order to better deal with cases like that, but if we went that route, then we pretty much couldn't treat them like forward ranges without save anymore. In many cases, we'd have to have separate implementations that avoided things like caching, and that would become problematic fast. I think that there always going to be cases where certain things have some funky corner cases with ranges - especially the further that they get from being dynamic arrays. We probably need to do a better job of formalizing some of this stuff to better avoid some of those corner cases, but I think that we're always going to be able to find something that we can define as a range that works in many cases but starts doing weird things if we use it on a large enough range of algorithms. - Jonathan M Davis