Re: Container insertion and removal

Robert Jacques Sun, 07 Mar 2010 09:45:42 -0800

On Sun, 07 Mar 2010 08:23:03 -0500, Steven Schveighoffer<[email protected]> wrote:

Robert Jacques Wrote:
On Sat, 06 Mar 2010 21:54:50 -0500, Steven Schveighoffer
<[email protected]> wrote:
> On Sat, 06 Mar 2010 11:19:15 -0500, Robert Jacques <[email protected]>
> wrote:
>
>> On Sat, 06 Mar 2010 08:19:36 -0500, Steven Schveighoffer
>> <[email protected]> wrote:
>>>
>>> How can softRemove not affect iterating ranges? What if the rangeis
>>> positioned on the element removed?
>>
>> It always affects ranges in so far as the range and container are
>> inconsistent, but that is a problem of softInsert as well. Removingan
>> element from an array doesn't invalidate the underlying range, since
>> the memory is still there. And so long as you're not trying to usefree
>> lists, linked-lists, trees, etc. can be written so that the ranges
>> never enter an invalid state. If they happen to be pointing at the
>> removed node, they just end up stopping early.
>
> If my linked list range has two node pointers as the implementation,and
> you remove the end node, it's going to end later, not early.

A linked list maps to a forward range: so the range simply points to a
single internal node. If you're going to store a pointer to the end,then
you should be using a doubly linked list, not a singly linked list.
How do you know when to stop? A range has a beginning and an ending,otherwise it's an iterator. Whether you store it via a pointer to thelast (not-to-be-iterated) node, or via a length, or via a value tocompare with for stopping, you have to use something. Or are youasserting the only useful range for a linked list iterates the entirelist?
Think of it as the equivalent of a slice of an array.

Please define for me an O(1) slice or index operation for a linked-list.The only way of doing this is to search the entire list in order,comparing for the search terms or counting the number of nodes passed. Therange itself can do this just as easily as the host container, if youreally want this functionality (I'd argue this isn't a valid listoperation). For simple list[5..10] operations its definitely moreefficient and then the range could even throw an exception when it reachesthe end of the list before finding the end slice value (due to listtopology manipulation). The efficiency of more complex ranges, likelist["apple".."oranges"], is still much more efficient for the singlerange case. Even when you start to take many slices, cache effects canmarginalize a lot of the cost.

> Of course, you can get around this by just storing a current node anda> length, but that may not be what the user is expecting. For example,if> you did a range of a sorted tree from "a" to "f" and it stops at "n",I
> think that would be extremely unexpected.

By this logic, any and all forms of mutation, not just insertions and
deletions must not be allowed.
Allowed, yes.  Not invalidating iterators, no.

If I change a node in a sorted tree from 'a' to 'n', the node moves,changing the tree topology. Any range currently at the node would continueto look for the 'f' node, and iterate the rest of the tree erroneously. Bythe way, having ranges detect if they reach their end nodes or not isfairly easy to do.

> Stopping early is invalidation also IMO. If your program logicdepends
> on iterating over all the elements you originally intended to iterate,
> then we have big problems if they stop early.

Stopping early is the result of a logical error by the programmer. The
code itself, however, is completely valid.
I still fail to see the difference between "soft" operations andnon-soft. What does soft guarantee? Give me a concrete definition, anexample would help too.

There are a couple of possible definitions for soft operations: 1) thememory safety of the ranges of a collection are guaranteed. 2) That forthe topology viewed by a range isn't logically changed. i.e. the rangewill continue to perform the same logical function if the topology itsoperating on is updated 3) That for the topology viewed by a range isn'tactually changed and all elements selected at range creation will beviewed. 4) Like 3, but with all values being viewed.

For example, modifying an array in any way doesn't change 1, 2 or 3 forany of its slices.For a linked list defining a forward range, mutation, insertion andremoval can be done under 1 & 2.

The same can be said about doubly linked lists and bidirectional ranges.

For other containers, such as a sorted tree, mutation can break a 2/3though insertion and deletion don't break 2.Although, the ranges will see many values, they may not see all the valuescurrently in the collection nor all the values in the collection when theiterator was generated. So code that relies on such properties would belogically invalid.

I'd probably define hard ops as being 1) and soft ops at level 2. 4) isreally only possible with immutable containers.

>>> The only two containers that would support softInsert would belinked>>> list and sorted map/set. Anything else might completely screw upthe
>>> iteration.  I don't see a lot of "generic" use for it.
>>
>> There's all the containers based upon linked-lists, etc like hashes,
>> stacks, queues and dequeues.
>
> Hashes may be rehashed when inserting, completely invalidating a range
> (possibly the end point will be before the starting point).
Wouldn't re-hashing necessitate re-allocation? (Thus the range wouldsee a
stale view)
God no. If my hash collision solution is linked-list based (which it isin dcollections), why should I reallocate all those nodes? I justrearrange them in a new bucket array.

Sorry, I was assuming that if you were going to implement a hashcollection you wouldn't be using a linked list approach, since that's whatD's associative arrays already do. The are some really good reasons to notuse a list based hash in D due to GC false pointer issues, but basicallynone to re-implementing (poorly?) D's built-in data structure.


> Yes, the others you mentioned will be valid.  But I still don't see it
> being more useful than just using documentation to indicate insertion
> will not invalidate ranges.  Perhaps I am wrong.

The difference is that algorithms can document in their template
constraints that they need a container with 'soft' properties.

What is the advantage? Why would an algorithm require soft functions?What is an example of such an algorithm?


Something that uses toUpperCase or toLowerCase, for example.

>>> Another option is to use a "mutation" field that is checked every
>>> chance by the range.  If it changes, then the range is invalidated.
>>
>> The mutation field would have to be a version number to support
>> multiple ranges, and given experience with lock-free algorithms which
>> use a 'tag' in a similar manner, this concept is bug prone and should
>> not be relied upon. It would be better to 'lock' the node orcontainer
>> to topology changes, though this does slow things down and has no
>> conflict resolution: removing a locked node would have to throw an
>> exception.
>
> I was not thinking of multithreaded applications.  I don't think it's
> worth making containers by default be multithreaded safe.
I wasn't thinking of multi-threaded containers. I was trying to pointout
that version ids have failed in lock-free containers, where things are
happening on the order of a single atomic op or a context switch. Given
the time a range could go unused in standard code, versioning won'twork.
Are you trying to say that if you don't use your range for exactly 2^32mutations, it could mistakenly think the range is still valid? That's avalid, but very very weak point.

Umm, no. That is a valid point that happens in production code todisastrous effects. Worse, it doesn't happen often and there's no goodunit test for it. I for one, never want to debug a program that onlyglitches after days of intensive use in a live environment with realcustomer data. Integer overflow bugs like this are actually one of the fewbugs that have ever killed anyone.

> The mutation index has been used in Tango forever, and I think was in
> Doug Lea's original container implementations.  I'm pretty sure it is
> sound in single-threaded uses.

No it's not. version tags + integer overflow = bug. Doug Lea knew about
the problem and but thought it would never happen in real code. And Bill
Gates thought no one will need more than 640k of ram. They both havebeen
proven wrong.
Overflow != bug. Wrapping completely to the same value == bug, but isso unlikely, it's worth the possibility.

Statistics 101: do a test enough times and even the highly improbable willhappen.

Re: Container insertion and removal

Reply via email to