On Sun, 07 Mar 2010 08:23:03 -0500, Steven Schveighoffer <[email protected]> wrote:

Robert Jacques Wrote:

On Sat, 06 Mar 2010 21:54:50 -0500, Steven Schveighoffer
<[email protected]> wrote:
> On Sat, 06 Mar 2010 11:19:15 -0500, Robert Jacques <[email protected]>
> wrote:
>
>> On Sat, 06 Mar 2010 08:19:36 -0500, Steven Schveighoffer
>> <[email protected]> wrote:
>>>
>>> How can softRemove not affect iterating ranges? What if the range is
>>> positioned on the element removed?
>>
>> It always affects ranges in so far as the range and container are
>> inconsistent, but that is a problem of softInsert as well. Removing an
>> element from an array doesn't invalidate the underlying range, since
>> the memory is still there. And so long as you're not trying to use free
>> lists, linked-lists, trees, etc. can be written so that the ranges
>> never enter an invalid state. If they happen to be pointing at the
>> removed node, they just end up stopping early.
>
> If my linked list range has two node pointers as the implementation, and
> you remove the end node, it's going to end later, not early.

A linked list maps to a forward range: so the range simply points to a
single internal node. If you're going to store a pointer to the end, then
you should be using a doubly linked list, not a singly linked list.

How do you know when to stop? A range has a beginning and an ending, otherwise it's an iterator. Whether you store it via a pointer to the last (not-to-be-iterated) node, or via a length, or via a value to compare with for stopping, you have to use something. Or are you asserting the only useful range for a linked list iterates the entire list?

Think of it as the equivalent of a slice of an array.

Please define for me an O(1) slice or index operation for a linked-list. The only way of doing this is to search the entire list in order, comparing for the search terms or counting the number of nodes passed. The range itself can do this just as easily as the host container, if you really want this functionality (I'd argue this isn't a valid list operation). For simple list[5..10] operations its definitely more efficient and then the range could even throw an exception when it reaches the end of the list before finding the end slice value (due to list topology manipulation). The efficiency of more complex ranges, like list["apple".."oranges"], is still much more efficient for the single range case. Even when you start to take many slices, cache effects can marginalize a lot of the cost.


> Of course, you can get around this by just storing a current node and a > length, but that may not be what the user is expecting. For example, if > you did a range of a sorted tree from "a" to "f" and it stops at "n", I
> think that would be extremely unexpected.

By this logic, any and all forms of mutation, not just insertions and
deletions must not be allowed.

Allowed, yes.  Not invalidating iterators, no.

If I change a node in a sorted tree from 'a' to 'n', the node moves, changing the tree topology. Any range currently at the node would continue to look for the 'f' node, and iterate the rest of the tree erroneously. By the way, having ranges detect if they reach their end nodes or not is fairly easy to do.


> Stopping early is invalidation also IMO. If your program logic depends
> on iterating over all the elements you originally intended to iterate,
> then we have big problems if they stop early.

Stopping early is the result of a logical error by the programmer. The
code itself, however, is completely valid.

I still fail to see the difference between "soft" operations and non-soft. What does soft guarantee? Give me a concrete definition, an example would help too.

There are a couple of possible definitions for soft operations: 1) the memory safety of the ranges of a collection are guaranteed. 2) That for the topology viewed by a range isn't logically changed. i.e. the range will continue to perform the same logical function if the topology its operating on is updated 3) That for the topology viewed by a range isn't actually changed and all elements selected at range creation will be viewed. 4) Like 3, but with all values being viewed.

For example, modifying an array in any way doesn't change 1, 2 or 3 for any of its slices. For a linked list defining a forward range, mutation, insertion and removal can be done under 1 & 2.
The same can be said about doubly linked lists and bidirectional ranges.
For other containers, such as a sorted tree, mutation can break a 2/3 though insertion and deletion don't break 2. Although, the ranges will see many values, they may not see all the values currently in the collection nor all the values in the collection when the iterator was generated. So code that relies on such properties would be logically invalid.

I'd probably define hard ops as being 1) and soft ops at level 2. 4) is really only possible with immutable containers.

>>> The only two containers that would support softInsert would be linked >>> list and sorted map/set. Anything else might completely screw up the
>>> iteration.  I don't see a lot of "generic" use for it.
>>
>> There's all the containers based upon linked-lists, etc like hashes,
>> stacks, queues and dequeues.
>
> Hashes may be rehashed when inserting, completely invalidating a range
> (possibly the end point will be before the starting point).

Wouldn't re-hashing necessitate re-allocation? (Thus the range would see a
stale view)

God no. If my hash collision solution is linked-list based (which it is in dcollections), why should I reallocate all those nodes? I just rearrange them in a new bucket array.

Sorry, I was assuming that if you were going to implement a hash collection you wouldn't be using a linked list approach, since that's what D's associative arrays already do. The are some really good reasons to not use a list based hash in D due to GC false pointer issues, but basically none to re-implementing (poorly?) D's built-in data structure.


> Yes, the others you mentioned will be valid.  But I still don't see it
> being more useful than just using documentation to indicate insertion
> will not invalidate ranges.  Perhaps I am wrong.

The difference is that algorithms can document in their template
constraints that they need a container with 'soft' properties.

What is the advantage? Why would an algorithm require soft functions? What is an example of such an algorithm?

Something that uses toUpperCase or toLowerCase, for example.


>>> Another option is to use a "mutation" field that is checked every
>>> chance by the range.  If it changes, then the range is invalidated.
>>
>> The mutation field would have to be a version number to support
>> multiple ranges, and given experience with lock-free algorithms which
>> use a 'tag' in a similar manner, this concept is bug prone and should
>> not be relied upon. It would be better to 'lock' the node or container
>> to topology changes, though this does slow things down and has no
>> conflict resolution: removing a locked node would have to throw an
>> exception.
>
> I was not thinking of multithreaded applications.  I don't think it's
> worth making containers by default be multithreaded safe.

I wasn't thinking of multi-threaded containers. I was trying to point out
that version ids have failed in lock-free containers, where things are
happening on the order of a single atomic op or a context switch. Given
the time a range could go unused in standard code, versioning won't work.

Are you trying to say that if you don't use your range for exactly 2^32 mutations, it could mistakenly think the range is still valid? That's a valid, but very very weak point.

Umm, no. That is a valid point that happens in production code to disastrous effects. Worse, it doesn't happen often and there's no good unit test for it. I for one, never want to debug a program that only glitches after days of intensive use in a live environment with real customer data. Integer overflow bugs like this are actually one of the few bugs that have ever killed anyone.


> The mutation index has been used in Tango forever, and I think was in
> Doug Lea's original container implementations.  I'm pretty sure it is
> sound in single-threaded uses.

No it's not. version tags + integer overflow = bug. Doug Lea knew about
the problem and but thought it would never happen in real code. And Bill
Gates thought no one will need more than 640k of ram. They both have been
proven wrong.

Overflow != bug. Wrapping completely to the same value == bug, but is so unlikely, it's worth the possibility.

Statistics 101: do a test enough times and even the highly improbable will happen.

Reply via email to