Re: protocol for using InputRanges

Walter Bright Tue, 25 Mar 2014 13:21:21 -0700

It's pretty clear that:

1. the protocol is COMPLETELY undocumented and undefined.

2. in this vacuum, people (including me) have made all sorts of assumptions, andassumed those assumptions were universal and shared.

Since ranges+algorithms are central to D, this is a very serious problem. Thereare two competing schools of thought:


1. maximum flexibility

2. maximum performance

(1) has been explained well in this thread already. (2) not so much, so I'llstand up for that one.

I've been recently involved in writing high performance apps, as anyonefollowing ScopeBuffer knows. With such apps, it's clear that:


1. size matters - the fewer registers an object fits in, the faster it goes
2. every instruction matters - more operations == slower code

3. lazy seems to be faster, mainly because it eliminates the storage managementcost of buffers and extra copying into and out of those buffers

We want to appeal to the high performance coders. To maximize performance,ranges should be optimized to the inner loop case, which is:


    while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }

With the additional proviso that ranges are passed to algorithms by value, sothey should be cheap to copy. Cheap to copy usually means them being small.


A) I know that my range is not empty, so I can skip calling empty.

Since front is guaranteed to succeed if !empty, this puts a requirement on manyranges that they have a non-trivial constructor that 'primes the pump'. Ofcourse, priming may fail, and so construction may throw, which is not gooddesign. Next, priming requires a buffer in the range. Buffers add size, makingthem slower to copy, and will often require another field saying if the bufferhas data in it or not, further bumping the size. And lastly, it means that lazyranges will be required to read the first element, even if the range isn't thensubsequently used, which defeats what one would expect from a lazy range. Byhaving mere construction of a range initiate reading from source, this makes theuseful idiom of separating construction from use impractical. (For example,building a pipeline object, then sending it to another thread for execution.)

All this saves for the user is one call to empty for the entire algorithm, at acost incurred with every iteration. I.e. it selects O(n) to save O(1).

B) I want to be able to call front multiple times in a row, and expect to getthe same value.

This can force the range to incorporate a one element buffer and a flagindicating the status of that buffer. The range instance gets bigger and moreexpensive to copy, and the cost of manipulating the flag and the buffer is addedto every loop iteration. Note that the user of a range can trivially use:

     auto e = r.front;
     ... using e multiple times ...
instead.

The big problem with (A) and (B) is these costs become present in every range,despite being rarely needed and having simple alternatives. This is the wrongplace to put cost and complexity. The user should be making these decisionsbased on his needs.


Hence, I propose that the protocol for using input ranges be defined as:

    while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }

This makes it possible to build pipelines that are firehoses with no kinks orconstrictions in them. It optimizes for the inner loop case, not boundary cases.

Re: protocol for using InputRanges

Reply via email to