Re: Improving the @shadow code

Edward K. Ream Mon, 09 Feb 2015 04:54:28 -0800

On Saturday, February 7, 2015 at 7:40:05 PM UTC-6, Edward K. Ream wrote:

For the last several days I have been refactoring and simplifying Leo's 
> fundamental @shadow algorithm in x.propagate_changed_lines and its helpers 
> in the ShadowController class.
>
> Imo, this work is essential, whatever the short-term risk of breaking 
> @shadow.
>


Work continues, with a real chance of breaking something.  This can't be 
helped.

This post tells how to simplify the code by using new data structures.

This is an Engineering Notebook post: feel free to ignore unless you are 
one of Leo's core developers.

===== Terminology

Recent revs have converted names to match to difflib terms:

- The inputs to difflib are x.a and x.b.
- x.a is the new name for x.old_public_lines
- x.b is the new name for x.new_public_lines.

These shorter names save a lot of mental translation: they are *clearer* 
than the longer names.

There are two other changes, but these objects are about to disappear!

- x.old_sent_lines is the new name for old_private lines.
- The readers are x.a_rdr, x.b_rdr and x.sent_rdr.

===== New data structures

There are two interrelated problems with the present code:

1. Both sentinel and non-sentinel lines are intermixed in the 
old_private_lines array.
2. The x.mapping array complicates and obscures the code.

The present data structures needlessly complicate the @shadow update 
algorithm. We can eliminate the x.mapping array and the resulting 
complications (and bugs!) as follows.

A prepass will convert the old_private_lines array into two arrays: x.lines 
and x.sentinels, both of the same length.

x.lines will contain all the *non-sentinel* lines of old_private_lines.

x.sentinels will be a list of lists.  For each i, x.sentinels[i] will be 
the list of all sentinels lines that precede the non-sentinel line at 
x.lines[i].

x.trailing_sentinels will be the list of all trailing sentinels in 
old_private_lines.  These sentinels will be written after the main loop 
completes.

This scheme eliminates all index munging.  We can use the indices in the 
difflib opcodes just as they are.  Sentinel lines no longer affect indices!

Handling @verbatim sentinels also becomes cleaner.  The new prepass will 
strip @verbatim sentinels when creating the x.lines array.  The opcode 
handlers will treat all lines in x.lines identically. When writing the 
final output, the code always knows whether it is writing sentinel lines or 
non-sentinel lines.

- The code will use x.put_sentinels to write sentinel lines.
- The code will use x.put_plain_line to write non-sentinel lines.

x.put_plain_line adds @verbatim lines to x.results as necessary. 

===== Simplified algorithms

This new infrastructure will simplify the opcode methods as follows:

1. The reader streams disappear!
2. The SequenceMatcher indices can be used directly, without any munging.
3. It is (almost) always obvious when to write sentinels. When writing line 
b[i] corresponding to line a[j] we first write all sentinels lines in 
x.sentinels[j].

Let's consider each kind of opcode.  The simplest is 'equal'. Each opcode 
has the form::

   tag,ai,aj,bi,bj = opcode

Here is the code (untested) for the opcode whose tag is 'equal':

    # assert aj - ai == bj - bi and x.lines[ai:aj] == x.lines[bi:bj]
    for i in range(ai,aj+1):
        x.put_sentinels(x.sentinels[i])
        x.put_plain_line(x.a[i])
            # works because x.lines[ai:aj] == x.lines[bi:bj] 

The code for the 'delete' opcode:

    for i in range(ai,aj+1):
        x.put_sentinels(x.sentinels[i])

The code for the 'insert' opcode:

    for i in range(bi,bj+1):
        x.put_plain_line(x.b[i])

The 'replace' opcode poses a question that has been swept under the rug. 
When replacing one set of lines by another, where should sentinels be 
written? There are at least three possibilities:

1. Before all replaced lines.
2. After all replaced lines.
3. Interspersed with the replaced lines (as much as possible) using x.lines

This a judgment call.  There is no completely "right" answer.

Suppose we favor option 3.  Even assuming that 3 new lines replace 3 old 
lines, there is no guarantee that the user *intended* the sentinels to 
occur as they did previously.

Imo, option 1 is a reasonable approach.  As far as placing sentinels goes, 
we would treat the 'replace' opcode like a 'delete' followed by an 
'insert'.  The code would be:

    for i in range(ai,aj+1):
        x.put_sentinels(x.sentinels[i])
    for i in range(bi,bj+1):
        x.put_plain_line(x.b[i])

Only experimentation will tell which option is best in practice.

===== Conclusion

Eliminating x.mapping and the reader classes collapses the complexity of 
the opcode handlers. For the first time, the intentions of the code will be 
clear.  

There is no guarantee that this scheme will work, but I have high hopes. 
During testing, the new code will be enabled by a new_shadow switch at the 
start of leoShadow.py.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/leo-editor.
For more options, visit https://groups.google.com/d/optout.

Re: Improving the @shadow code

Reply via email to