On Tuesday, 10 April 2018 at 20:08:14 UTC, Jonathan M Davis wrote:
On Tuesday, April 10, 2018 19:47:10 Nordlöw via
Digitalmars-d-learn wrote:
On Tuesday, 10 April 2018 at 14:34:40 UTC, Adam D. Ruppe wrote:
> On Tuesday, 10 April 2018 at 14:25:52 UTC, Nordlöw wrote:
>> Should ranges always provide a length property?
>
> No.
>
>> If so, in which cases is a length property an advantage or
>> a requirement?
>
> Just provide it whenever it is cheap to do so. If you need
> to do complex calculations or especially loop over contents
> to figure out the length, do NOT provide it.
>
> But if it is as simple as returning some value, provide it
> and algorithms can take advantage of it for optimizations
> etc. as needed.
I'm thinking of my own container Hashmap having its range
ByKeyValue requiring one extra word of memory to store the
iteration count which, in turn, can be used to calculate the
length of the remaining range. Is this motivated?
That would depend entirely on what you're trying to do, but in
general, if a range has length, then some algorithms will be
more efficient, and some algorithms do require length. So, if
you can provide length, then the range will be more useful,
just like a bidirectional range can be more useful than a
forward range or a random-access range can be more useful than
either. However, if you're not doing anything that ever
benefits from it having length, then it doesn't buy you
anything. So, it ultimately depends on what you're doing. In a
general purpose library, I'd say that it should have length if
it can do so in O(1), but if it's just for you, then it may or
may not be worth it.
The other thing to consider is what happens when the container
is mutated. I don't think that ranges necessarily behave all
that well when an underlying container is mutated, but it is
something that has to be considered when dealing with a range
over a container. Even if mutating the underlying container
doesn't necessarily invalidate a range, maintaining the length
in the manner that you're suggesting probably makes it so that
it would be invalidated in more cases, since if any elements
are added or removed in the portion that was already popped off
the range, then the iteration count couldn't be used to
calculate the length in the same way anymore. Now, with a hash
map, the range is probably fully invalidated when anything gets
added or removed anyway, since that probably screws with the
order of the elements in the range, but how the range is going
to behave when the underlying container is mutated and how
having the length property does or doesn't affect that is
something that you'll need to consider.
- Jonathan M Davis
I find that discussion very interesting as I had never considered
that because of design by introspection having a costly length
method would lead to unexpected calls by generic algorithms
making it a disadventage if present.
On the other hand I don't think the end user should have to
scratch his head to find the length of a range, especially if
it's not trivial to get (say, O(log n) kind of case). Therefore
exposing a method in any case seems the best from an API
perspective.
But to avoid the performance issues mentionned earlier it means
it should bear a different name (get/setLength comes to mind). I
believe this is the same kind of issue that lead to having "in"
for associative arrays but not regular ones. However this also
leads to less coherent APIs in contradiction with the principle
of least surprise.
In retrospect since only "unexpected" calls to such methods cause
the issue I wonder if it wouldn't be best to have an UDA saying
"Hey, please, this method is costly, if you're a generic template
performing introspection you should probably not call me". And
writing that Andrei's work on complexity annotations comes to
mind. Anyway, I don't think the user should use different names
just to alleviate an issue on the library side but the
alternative would be costly to put in place...
Any thoughts?