A shock: p.deletePositionsInList must be rethought and rewritten

Edward K. Ream Fri, 05 Jul 2013 16:07:18 -0700

I have spent several hours recently writing unit tests for 
p.deletePositionsInList.  Today I realized that the failing unit tests have 
been telling me something unexpected and important.


There has been much discussion recently about deleting lists of positions.  
I now see that all previous strategies are fatally flawed.  This is quite 
shocking.

Here is the Aha:  the positions passed to p.deletePositionsInList only 
*specify* the desired changes; the only way to *make* those changes is to 
operate on vnodes!

The new view of the problem is relatively straightforward.  Consider this 
very simple outline, containing no clones:

+ ROOT
  - A
  - B

The fundamental problem is *very* simple.  If we delete node A, the index 
of node B in ROOT.children will change. This problem has (almost) nothing 
to do with clones or positions!

To make this concrete, let's look at the *vnodes* that represent this 
tree.  It is the vnodes, and *not* the positions, that represent all of 
Leo's data.  Let ROOT, A and B be the vnodes corresponding to the nodes 
ROOT, A and B.  ROOT.children will look like this at first::

    ROOT.children = [A,B]

That is, the children array contains references (links) to both A and B.  
After deleting A, we will have::

    ROOT.children = [B]

As you can see, the reference to B is at index 1 of ROOT.children before 
deleting A, and at index 0 of ROOT.children after deleting A.  Thus, *any* 
position referring to B will become invalid after deleting A.

Several people, including myself, have proposed an unsound solution--just 
delete positions in reverse order, so that B will be deleted before A.  
This idea has appeal, but it truly *is* unsound, as today's unit tests 
finally convinced me.

Here is an outline that at last explodes the notion that there is *any* 
correct order for deleting positions. All A' nodes are clones of each 
other::

+ ROOT
  + A'
    - B # at position p1
  + A'
    - B # at position p2

**Important**: B is *not* a clone.  Also note that there is only *one* node 
called A and *one* node called B.  The children arrays will look like::

    ROOT.children = [A,A]
    A.children = [B]
    B.children = []

It surely must be reasonable to pass either *or both* positions p1 and p2 
to p.deletePositionsInList.  But after deleting the B corresponding to p1, 
the children arrays will look like:

    ROOT.children = [A,A]
    A.children = []
    B.children = [] # B is no longer referenced anywhere!

So if p.deletePositionsInList attempts to delete position p2 (from A), B 
will no longer appear in A.children!

There are many other cases that we could discuss, but the conclusion in all 
cases is that we must use the positions passed to p.deletePositionsInList 
only as *hints* about what to do.

Happily, there is a simple strategy that sidesteps all the difficulties:

Step 1.  Verify, *before* making any changes to the outline, that all the 
positions passed to p.deletePositionsInList *initially* make sense.

Step 2. Treat each position as a "request" to delete *some* vnode from the 
children array in the *position's* parent vnode.

This is just a bit subtle.  Let me explain it in detail.

First, recall that vnodes do not have unique parent vnodes.  Because of 
clones, a vnode may may have *many* parents.  Happily, every position 
*does* specify a unique parent (vnode) at that position.

Second, as shown above, there is no way to order positions such that all 
later positions remain valid. As the example above shows, deleting (the 
vnode corresponding to) a position P may cause *all* later positions 
referring to P.v to refer to *already deleted* vnodes.

In other words, we simply *must* ignore the child indices in positions.  
Given a position P, P.parent is well defined.  So Step 2 above will simply 
delete the *first* element in P.parent.children containing P.v.  

As we have seen, there may not even *be* any such element of 
P.parent.children: a previous delete may have already deleted the last item 
of P.parent.children equal to P.v.  That should *not* be considered an 
error--Step 1 has ensured that all positions *originally* did make sense.

Summary

Positions passed to p.deletePositionsInList specify *vnodes* to be deleted 
from specific parents, but they do *not* specify at what index in the 
parent.children array (if any!) those vnodes are to be found. The algorithm 
will delete the *first* item in the children array that references the 
vnode to be deleted.  

This will almost always be good enough.  In the unlikely event that more 
control is desired, p.deletePositionsInList can not possibly be used.

The new emphasis on vnodes at last puts the problem an a completely solid 
foundation.  Moreover, the new algorithm should be considerably faster than 
the old: there is no use in sorting positions into any kind of order.

I'm afraid I have not succeeded in conveying the essential simplicity of 
the idea. Feel free to ask questions! Your comments, please.

Next, I'll rewrite the code and see what the unit tests make of it.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/leo-editor.
For more options, visit https://groups.google.com/groups/opt_out.

A shock: p.deletePositionsInList must be rethought and rewritten

Reply via email to