This Engineering Notebook post describes a spectacular collapse in 
complexity in leoTokens.py, Leo's new beautifier.


Last Sunday, January 28, I rewrote leoTokens.py. I had struggled with this 
project for four weeks, but within 24 hours, the project was essentially 
complete! See the PostScript for the log.


A dead simple *token-based scanner* replaces a horribly complex token-based 
recursive-descent parser. Computing token ranges bedeviled the parser. The 
scanner avoids all those complications. The new scanner knows almost 
nothing about Python syntax!


The *pre_scan* method and its helpers replace the entire parser. The 
pre_scan method calls three *finishers*, finish_arg, finish_dict and 
finish_slice. The finishers salvage the semantics from the old parser. 
These methods practically wrote themselves.


*Origin of the Aha*


Big Ahas change the mental landscape so thoroughly that reconstructing 
their genesis becomes impossible. My best guess: weeks of immersion in the 
old code subconsciously showed me that only the precursors to the finishers 
were worth saving. That gave me the courage to start again.


For the last week, my subconscious has been screaming at me. Its criticisms 
were varied, personal, and insulting. Those criticisms were off the mark, 
but the message was valid: *do something different!*


Using ChatGPT might likely have *prevented *the Aha. I had to struggle with 
the doomed code first! Otherwise, I would not have gained the deep 
knowledge required to see the way forward.


*Comparison with other breakthroughs*


Most code collapses arise from a long sequence of methodical, incremental 
simplifications. This Aha was different. I suddenly "just knew" that 
parsing was the wrong approach.


I can think of only two comparable flashes in Leo's history:


*@clean*: Aha! Leo can use the outline instead of shadow files. This 
insight happened when I was working on another project!


*Leo's importers:* Aha! Guide lines eliminate all difficulties in handling 
comments and strings.


*Feeble unit tests*


PR #3773 <https://github.com/leo-editor/leo-editor/pull/3773> removes 
recent unit tests. These tests tested the parser instead of the intended 
results. The PR gets to 100% coverage without these feeble tests.


*A coding one-off*


The pre_scan method is a one-time trick. Tools such as mypy, pylint, or 
mypy *must* use a parse tree. Even the super-fast pyflakes tool uses a 
parse tree. pyflakes is so fast because Python's ast.parse is essentially C 
code.


*Summary*


I feel like a mathematician who has discovered an unexpectedly elementary 
proof of a complex theorem.


The code pattern seems limited to beautification. Other language tools must 
use parse trees. Still, the Aha is a metaphor for possibilities hiding in 
plain sight. That's something!


Edward


P.S. Here is the log of the first 24 hours of work on the scanner:


I saw the way forward Sunday afternoon. The first commit of the rewrite was 
rev e8a4224 
<https://github.com/leo-editor/leo-editor/commit/e8a4224239cd36d8b0ee63b8b1df01bb1691425e>
 
(first draft of a simple scanner) at 13:13:01 on Sunday afternoon.


Just 24 hours later, at 13:46:37 on Monday, rev 3662d55 
<https://github.com/leo-editor/leo-editor/commit/3662d553e72af804a28694118e0e9596ab80c310>
 
completed the project. Only a few packaging details remain. It was quite a 
day.


EKR

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/f79cb2eb-5f2f-4d31-854f-391ffebd9920n%40googlegroups.com.

Reply via email to