ENB: Improving Leo's colorizer

Edward K. Ream Sat, 09 Nov 2024 07:52:10 -0800

 

This Engineering Notebook post contains design and coding notes about PR 
#4147 <https://github.com/leo-editor/leo-editor/pull/4147>. As always, 
please feel free to ignore this ENB.

This ENB is unusually long and complex. It is essentially notes to myself.
It is also pre-writing for:

- A new info item: Theory of Operation for Leo's colorizer.

- Several (long!) docstrings in various methods in the JEditColorizer (
*jedit*) class. Unless indicated otherwise, all methods mentioned here are
members of the jedit class.

These notes are essential--it's way too easy to become confused in a maze
of contexts and hidden relationships. Such confusion caused a gigantic
performance bug a decade ago. A similar confusion lies beneath the present
minor performance bug.

I have spent several days tracing the code. As an essential first step, I
removed all the hacks that partially ruined the legacy code. Now I know how
the qsh works (and *should* work) when Leo *isn't** mucking things up.*
This ENB rests on that new foundation of understanding.

*Requirements*

The legacy code in the jedit class might seem overly complex. But it's
actually a superb piece of engineering! Anyway, only the profoundly
ignorant would suggest wholesale changes. Indeed, the jedit class is highly
reliable, despite its performance bug.

The following requirements will ensure that the PR fixes the performance
bug while retaining the colorizer's outstanding reliability.

1. The PR *must not* interfere with the qsh. *The qsh is highly optimized.*
The previous attempts to "help" it were grossly misguided! The PR must
remove all such hacks and interference.

2. The PR *must not* change *mainLoop*, the method that calls the pattern
matchers.

3. The PR *must not* change any pattern matcher. As an exception, the
pattern matchers that handle @language, @color, etc., may change how they
handle the qsh state. Such *semantic changes* are likely to be
straightforward.

4. The PR *must not* change any code outside leoColorizer.py. As the single
exception, LeoTree.select may call *c.recolor(p)*, where p is the *new*
value of c.p.

*The PR's remaining bug*

The PR *already* colors most nodes correctly. However, it does not handle
nodes containing multiple @language directives. The PR probably also has
some issues with nodes containing @nocolor, @color and @killcolor.

I now know why the PR doesn't work perfectly. The qsh can (and will) call
_recolor(s) where s is *out-of-sequence* with the preceding call. For
example, the following snippet is typical in @jupytext nodes:

# ?? (like @language python)

def spam():

# %% [markdown] (like @language md)

# Section 1

It was the best of times; it was the worst of times.

The user can click (or cut or paste) *anywhere* in this text. The qsh
coordinates with QTextEdit to detect changes. The qsh then calls
_recolor(s) starting with the first changed line. *There is no necessary
relationship between s and the line that _recolor last saw!*

The legacy code attempted to switch between languages using its* language*
ivar. But this approach can't handle out-of-sequence calls to _recolor.
It's as simple as that. Happily, tweaking the legacy code should squash
this bug.

*The legacy colorizer works correctly!*

The legacy code handled multiple @language directives by forcing the qsh to
redraw c.p.b whenever _redraw saw an @language directive. Leonistas didn't
notice this performance hit because this approach suffers only an O(N)
slowdown. This cost is usually negligible because most nodes contain less
than 1000 lines.

*Overview of the colorizer*

The calling sequence of the various components is as follows:

*LeoTree.select* changes c.p. It's a complex task. It must not change in
any way except that it may call c.recolor right at the very end.

LeoTree.select calls c.recolor(p), where p is Leo's *new* position. In
turn, c.recolor calls *c.colorize*, the *only* entry point for the jedit
colorizer.

At present, c.colorize does some preliminary initialization. I'm not sure
that's required. In any event, c.colorize ends as follows:

# Tell QSyntaxHighlighter to do a full recolor.

try:

self.in_full_redraw = True # Debugging only.

self.highlighter.rehighlight()

finally:

self.in_full_redraw = False

This code causes the qsh to call *_redraw(s)* whenever the user changes
p.b. Remember that "s" is a *single* line of text. *Forget this at your
peril!*

The big picture is this: _redraw and the qsh are expected to collaborate. In
particular, _redraw must:

- Call qsh methods to colorize s, a *single* line of text.

- Inform the qsh of the *qsh state* of the *end* of the colorized line.

If _redraw sets the qsh state properly, the qsh will issue *highly
optimized* calls to _redraw(s). In other words, the qsh will call
_redraw(s) only if the coloring for s might change. This optimization is
the reason why the qsh exists!

*Summary of _redraw*

To recap: _redraw(s) must colorize "s" (a *single* line of text) and inform
the qsh of the qsh state that should be in effect afterward.

_redraw computes the *initial *qsh state and calls *mainLine* to colorize
s. The mainLine method does not change state directly! It only calls
pattern matches. The pattern matchers compute the final qsh state as a side
effect.

*Summary*

Early commits to the PR removed all hacks and added traces based on new
tracing tools.

Understanding, not guesswork, is the *only *way to improve Leo's colorizer.
This ENB solidifies my insights. It's time to fix the bug!

This ENB is pre-writing for several (long!) docstrings:

# Read this docstring or suffer!

<< jedit.whatever: docstring >>

The crux of the PR lies in _recolor. This method must compute the initial
qsh state so that the qsh calls _redraw as few times as possible.

The final code should look natural and uncomplicated. Oh how appearances
can be deceiving! It's been a long road.

Edward

P.S. It has taken many years to complete Leo's colorizer. There is no shame
in that. My recent studies depended on essential tools that didn't exist
almost two decades ago when I first wrote Leo's colorizer:

- PRs: a permanent record of work.

- git bisect: risk-free programming.

- python's f-strings, g.app.debug and g.printObj.

These tools make sophisticated traces possible.

- cff: *the* all-important study tool.

Investigating the performance bug required *all* of these tools. Even so,
the PR is the record of arduous work.

EKR

--
You received this message because you are subscribed to the Google Groups
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/leo-editor/ee7f6e4c-f0c3-4f17-b388-43c42f489de1n%40googlegroups.com.

ENB: Improving Leo's colorizer

Reply via email to