ENB: leoTokens.py: a spectacular collapse in complexity

Edward K. Ream Thu, 01 Feb 2024 07:19:34 -0800

 

This Engineering Notebook post describes a spectacular collapse in 
complexity in leoTokens.py, Leo's new beautifier.

Last Sunday, January 28, I rewrote leoTokens.py. I had struggled with this
project for four weeks, but within 24 hours, the project was essentially
complete! See the PostScript for the log.

A dead simple *token-based scanner* replaces a horribly complex token-based
recursive-descent parser. Computing token ranges bedeviled the parser. The
scanner avoids all those complications. The new scanner knows almost
nothing about Python syntax!

The *pre_scan* method and its helpers replace the entire parser. The
pre_scan method calls three *finishers*, finish_arg, finish_dict and
finish_slice. The finishers salvage the semantics from the old parser.
These methods practically wrote themselves.

*Origin of the Aha*

Big Ahas change the mental landscape so thoroughly that reconstructing
their genesis becomes impossible. My best guess: weeks of immersion in the
old code subconsciously showed me that only the precursors to the finishers
were worth saving. That gave me the courage to start again.

For the last week, my subconscious has been screaming at me. Its criticisms
were varied, personal, and insulting. Those criticisms were off the mark,
but the message was valid: *do something different!*

Using ChatGPT might likely have *prevented *the Aha. I had to struggle with
the doomed code first! Otherwise, I would not have gained the deep
knowledge required to see the way forward.

*Comparison with other breakthroughs*

Most code collapses arise from a long sequence of methodical, incremental
simplifications. This Aha was different. I suddenly "just knew" that
parsing was the wrong approach.

I can think of only two comparable flashes in Leo's history:

*@clean*: Aha! Leo can use the outline instead of shadow files. This
insight happened when I was working on another project!

*Leo's importers:* Aha! Guide lines eliminate all difficulties in handling
comments and strings.

*Feeble unit tests*

PR #3773 <https://github.com/leo-editor/leo-editor/pull/3773> removes
recent unit tests. These tests tested the parser instead of the intended
results. The PR gets to 100% coverage without these feeble tests.

*A coding one-off*

The pre_scan method is a one-time trick. Tools such as mypy, pylint, or
mypy *must* use a parse tree. Even the super-fast pyflakes tool uses a
parse tree. pyflakes is so fast because Python's ast.parse is essentially C
code.

*Summary*

I feel like a mathematician who has discovered an unexpectedly elementary
proof of a complex theorem.

The code pattern seems limited to beautification. Other language tools must
use parse trees. Still, the Aha is a metaphor for possibilities hiding in
plain sight. That's something!

Edward

P.S. Here is the log of the first 24 hours of work on the scanner:

I saw the way forward Sunday afternoon. The first commit of the rewrite was
rev e8a4224
<https://github.com/leo-editor/leo-editor/commit/e8a4224239cd36d8b0ee63b8b1df01bb1691425e>

(first draft of a simple scanner) at 13:13:01 on Sunday afternoon.

Just 24 hours later, at 13:46:37 on Monday, rev 3662d55
<https://github.com/leo-editor/leo-editor/commit/3662d553e72af804a28694118e0e9596ab80c310>

completed the project. Only a few packaging details remain. It was quite a
day.

EKR

--
You received this message because you are subscribed to the Google Groups
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/leo-editor/f79cb2eb-5f2f-4d31-854f-391ffebd9920n%40googlegroups.com.

ENB: leoTokens.py: a spectacular collapse in complexity

Reply via email to