On Thu, 26 Aug 2010 10:18:51 -0400, Pillsy <[email protected]> wrote:

These thoughts were inspired by the recent thread, "Retrieving the
traversed range". I am generally pretty sold on the range concept,
but I think for some operations some sort of cursor is required.
However, there are some unsafe cursor-related operations that are
best avoided.

However, as far as I can see, the unsafe operations revolve around
synthesizing a new range out of two cursors. However, you can get
a lot of the desired functionality associated with finding elements and
splitting ranges *without* allowing the synthesis of of ranges from
unrelated cursors. The key to doing this is to have each cursor be permanently bound to the underlying range which it was derived from.

If you have this kind of cursor, you only really need it to support two operations to get a huge amount of benefit:

     Range before ();

     Range after ();

The before () operation just returns a range of the appropriate type (Forward, Bidirectional, RandomAccess) of all the elements that come before the cursor is pointing to, while the after () operation returns a range of the appropriate type with all the elements after the cursor. Given this concept, a cursor really points at the boundary between elements of a range, if you will.

If find and related operations return this kind of cursor, all of the operations involving splitting ranges become trivial. It's completely consistent for either before () or after () to be empty, and since you can't glue two of these cursors together, you are always guaranteed to get well-defined ranges out.

While I think the before () and after () operations are the only ones
which are strictly necessary for the desiderata I've seen so far, people
could write their own generic algorithms if cursors supported more of
the iterator operations, for moving forward (and backward, where appropriate) and inspecting the element immediately after the cursor (or before, where appropriate), and if ranges supported operations to return cursors from reasonable places (like from the front or back, or from a specific indexed elemnt for RandomAccess ranges).

The key idea is that these cursors aren't a primitive part of a range; instead, they take a range and add a position *inside* the range. They're a perfect fit for all those "three-legged" operations out there because they actually have three legs.

That's exactly how cursors perform in dcollections. An excerpt from dcollections' concepts document:

Cursors:

Cursors are special 1 or 0 element ranges. They implement the forward range interface. The benefit to using cursors is that they always refer to exactly
one element.  A normal range uses two marker elements, a begin and an end
element. Therefore, cursors are less susceptible to invalidation on removal of
elements.

Cursors support these additional features:

- Use a cursor as a reference point when dealing with a container.
- Use as an end point when slicing a container. Note that some containers only
  support slicing with an arbitrary cursor and the beginning or end of the
  container.  Slicing is guaranteed to be a fast operation (lgn or better).

Determining cursor/range ownership:

All collections can identify whether a cursor/range belongs to them. Each collection class has a 'belongs' method to determine ownership. The ownership check is guaranteed to be a fast operation (O(lgN) or better).


-------------------

The 'three-legged' cursor as you propose was an early idea of mine too. Because dcollections containers are classes however, there is an easy point of ownership to establish -- the class reference. So a dcollections cursor doesn't have to refer to its owner, but it can in order to fulfill the requirement.

My view is that a cursor should be separate from a range, and a range should either be able to tell if a cursor is part of it (@safe) or be told that a cursor is a part of it (@system or @trusted). Inside a three-legged algorithm which accepts a range as input, and generates a cursor from the range for the third leg, it can force the range to assume the cursor is a part of it, knowing that it's true. You can make the second operation @safe by wrapping a cursor + range in a struct that generates the cursor from the range (and therefore knows the cursor is a part of the range).

For two examples of ranges that can tell if a cursor is a part of it, an array can tell if a cursor is a part of it without any extra mechanisms because it is an O(1) check for an interior pointer. A balanced tree can also perform an O(lgn) check that a cursor is part of it (though it's questionable whether a tree is a range).

-Steve

Reply via email to