This Engineering Notebook post contains design and coding notes about PR 
#4147 <https://github.com/leo-editor/leo-editor/pull/4147>. As always, 
please feel free to ignore this ENB.


This ENB is unusually long and complex. It is essentially notes to myself. 
It is also pre-writing for:


- A new info item: Theory of Operation for Leo's colorizer.

- Several (long!) docstrings in various methods in the JEditColorizer (
*jedit*) class. Unless indicated otherwise, all methods mentioned here are 
members of the jedit class.


These notes are essential--it's way too easy to become confused in a maze 
of contexts and hidden relationships. Such confusion caused a gigantic 
performance bug a decade ago. A similar confusion lies beneath the present 
minor performance bug.


I have spent several days tracing the code. As an essential first step, I 
removed all the hacks that partially ruined the legacy code. Now I know how 
the qsh works (and *should* work) when Leo *isn't** mucking things up.* 
This ENB rests on that new foundation of understanding.


*Requirements*


The legacy code in the jedit class might seem overly complex. But it's 
actually a superb piece of engineering! Anyway, only the profoundly 
ignorant would suggest wholesale changes. Indeed, the jedit class is highly 
reliable, despite its performance bug.


The following requirements will ensure that the PR fixes the performance 
bug while retaining the colorizer's outstanding reliability.


1. The PR *must not* interfere with the qsh. *The qsh is highly optimized.* 
The previous attempts to "help" it were grossly misguided! The PR must 
remove all such hacks and interference.


2. The PR *must not* change *mainLoop*, the method that calls the pattern 
matchers.


3. The PR *must not* change any pattern matcher. As an exception, the 
pattern matchers that handle @language, @color, etc., may change how they 
handle the qsh state. Such *semantic changes* are likely to be 
straightforward.


4. The PR *must not* change any code outside leoColorizer.py. As the single 
exception, LeoTree.select may call *c.recolor(p)*, where p is the *new* 
value of c.p.


*The PR's remaining bug*


The PR *already* colors most nodes correctly. However, it does not handle 
nodes containing multiple @language directives. The PR probably also has 
some issues with nodes containing @nocolor, @color and @killcolor.


I now know why the PR doesn't work perfectly. The qsh can (and will) call 
_recolor(s) where s is *out-of-sequence* with the preceding call. For 
example, the following snippet is typical in @jupytext nodes:


# ?? (like @language python)

def spam():


# %% [markdown] (like @language md)

# Section 1


It was the best of times; it was the worst of times.


The user can click (or cut or paste) *anywhere* in this text. The qsh 
coordinates with QTextEdit to detect changes. The qsh then calls 
_recolor(s) starting with the first changed line. *There is no necessary 
relationship between s and the line that _recolor last saw!*


The legacy code attempted to switch between languages using its* language* 
ivar. But this approach can't handle out-of-sequence calls to _recolor. 
It's as simple as that. Happily, tweaking the legacy code should squash 
this bug.


*The legacy colorizer works correctly!*


The legacy code handled multiple @language directives by forcing the qsh to 
redraw c.p.b whenever _redraw saw an @language directive. Leonistas didn't 
notice this performance hit because this approach suffers only an O(N) 
slowdown. This cost is usually negligible because most nodes contain less 
than 1000 lines.

*Overview of the colorizer*


The calling sequence of the various components is as follows:


*LeoTree.select* changes c.p. It's a complex task. It must not change in 
any way except that it may call c.recolor right at the very end.


LeoTree.select calls c.recolor(p), where p is Leo's *new* position. In 
turn, c.recolor calls *c.colorize*, the *only* entry point for the jedit 
colorizer.


At present, c.colorize does some preliminary initialization. I'm not sure 
that's required. In any event, c.colorize ends as follows:


# Tell QSyntaxHighlighter to do a full recolor.

try:

    self.in_full_redraw = True # Debugging only.

    self.highlighter.rehighlight()

finally:

    self.in_full_redraw = False


This code causes the qsh to call *_redraw(s)* whenever the user changes 
p.b. Remember that "s" is a *single* line of text. *Forget this at your 
peril!*


The big picture is this: _redraw and the qsh are expected to collaborate. In 
particular, _redraw must:


- Call qsh methods to colorize s, a *single* line of text.

- Inform the qsh of the *qsh state* of the *end* of the colorized line.


If _redraw sets the qsh state properly, the qsh will issue *highly 
optimized* calls to _redraw(s). In other words, the qsh will call 
_redraw(s) only if the coloring for s might change. This optimization is 
the reason why the qsh exists!


*Summary of _redraw*


To recap: _redraw(s) must colorize "s" (a *single* line of text) and inform 
the qsh of the qsh state that should be in effect afterward.


_redraw computes the *initial *qsh state and calls *mainLine* to colorize 
s. The mainLine method does not change state directly! It only calls 
pattern matches. The pattern matchers compute the final qsh state as a side 
effect.


*Summary*


Early commits to the PR removed all hacks and added traces based on new 
tracing tools.

Understanding, not guesswork, is the *only *way to improve Leo's colorizer. 
This ENB solidifies my insights. It's time to fix the bug!


This ENB is pre-writing for several (long!) docstrings:


# Read this docstring or suffer!

<< jedit.whatever: docstring >>


The crux of the PR lies in _recolor. This method must compute the initial 
qsh state so that the qsh calls _redraw as few times as possible.


The final code should look natural and uncomplicated. Oh how appearances 
can be deceiving! It's been a long road.


Edward


P.S. It has taken many years to complete Leo's colorizer. There is no shame 
in that. My recent studies depended on essential tools that didn't exist 
almost two decades ago when I first wrote Leo's colorizer:


- PRs: a permanent record of work.

- git bisect: risk-free programming.

- python's f-strings, g.app.debug and g.printObj.

These tools make sophisticated traces possible.

- cff: *the* all-important study tool.


Investigating the performance bug required *all* of these tools. Even so, 
the PR is the record of arduous work.


EKR

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/leo-editor/ee7f6e4c-f0c3-4f17-b388-43c42f489de1n%40googlegroups.com.

Reply via email to