#1266: Orange is the new black :-)

Edward K. Ream Sat, 07 Sep 2019 15:04:42 -0700

I've just upgraded #1266 
<https://github.com/leo-editor/leo-editor/issues/1266>. This is the *orange 
project*, to be done in the orange branch.

Imo, there are sound reasons for enhancing Leo's existing token-based
beautifier commands:

- They are simpler and faster than black.
- Explicit tokens represent strings and comments, avoiding coding horror .
- There is no need to argue with outsiders about philosophy or their code.
- We can add any options we like.

*Background*

I spent several hours this morning trying, and utterly failing to alter
black's code to do anything new. Basing a beautifier on ast's (parse
trees) seems reasonable, but I know from long experience that python's
ast's have big holes in them regarding strings and whitespace. The data is
there, but in an almost-impossible-to-use form. There is nothing that
black can do about this, except adopt occult hacks.

*How Leo's beautifier commands work*

Leo's token-based beautifier code is relatively easy to use and remember. I
am writing this from memory, without having looked at the code. There are
some complications, but these have already been handled. The code appears
solid.

Python's tokenize module *quickly *breaks the source into *input tokens*.
Each input token has a *token reader*, which calls one or more *code
generators*. Each code generator emits output tokens to the *output token
list*. Code generators typically "look behind", examining the previously
generated output tokens. Code generators may insert, delete or change
already-generated tokens. In other words, code generators act like
on-the-fly peephole optimizers.

The algorithm is extremely fast, because readers and code generators are
short and fast. Python's tokenizer module is very fast. I am free to add
new kinds of output tokens if doing so makes life easier for token readers
or code generators.

*Splitting lines*

The end-of-line (or is it start of line?) code generator will "look behind"
(into the output token list) to see what tokens exist on the last line.
Using those tokens, it will calculate the line's length, and find
black-like places to break (the tokens!) into separate lines. In essence,
this will be black's line-breaking strategy, adapted for tokens, not parse
trees. The code is likely to significantly simpler than black's.

*Summary*

Imo, it's reasonable to add black-like line breaking to Leo's existing
beautify commands. The result should be much faster than black. We'll have
full control over the sources, and license to add settings rather than
argue about preferences ;-)

All comments and questions are welcome.

Edward

P.S. One of the most tricky parts of the token-based code is handling
backslash-newlines. I don't remember the details.

It may be possible to follow black's lead and (optionally!) delete
backslash-newlines, relying on the line-breaking algorithm to fix things
up. This may require adding parentheses in some cases. It remains to be
seen whether token-based "back parsing" is up to the job.

EKR

--
You received this message because you are subscribed to the Google Groups
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/leo-editor/48192e7f-a784-489f-b43b-5550590870ce%40googlegroups.com.

#1266: Orange is the new black :-)

Reply via email to