I've just upgraded #1266 
<https://github.com/leo-editor/leo-editor/issues/1266>. This is the *orange 
project*, to be done in the orange branch.

Imo, there are sound reasons for enhancing Leo's existing token-based 
beautifier commands:

- They are simpler and faster than black.
- Explicit tokens represent strings and comments, avoiding coding horror .
- There is no need to argue with outsiders about philosophy or their code.
- We can add any options we like.

*Background*

I spent several hours this morning trying, and utterly failing to alter 
black's code to do anything new.  Basing a beautifier on ast's (parse 
trees) seems reasonable, but I know from long experience that python's 
ast's have big holes in them regarding strings and whitespace.  The data is 
there, but in an almost-impossible-to-use form.  There is nothing that 
black can do about this, except adopt occult hacks.

*How Leo's beautifier commands work*

Leo's token-based beautifier code is relatively easy to use and remember. I 
am writing this from memory, without having looked at the code. There are 
some complications, but these have already been handled. The code appears 
solid.

Python's tokenize module *quickly *breaks the source into *input tokens*.  
Each input token has a *token reader*, which calls one or more *code 
generators*.  Each code generator emits output tokens to the *output token 
list*. Code generators typically "look behind", examining the previously 
generated output tokens. Code generators may insert, delete or change 
already-generated tokens. In other words, code generators act like 
on-the-fly peephole optimizers. 

The algorithm is extremely fast, because readers and code generators are 
short and fast. Python's tokenizer module is very fast. I am free to add 
new kinds of output tokens if doing so makes life easier for token readers 
or code generators.

*Splitting lines*

The end-of-line (or is it start of line?) code generator will "look behind" 
(into the output token list) to see what tokens exist on the last line.  
Using those tokens, it will calculate the line's length, and find 
black-like places to break (the tokens!) into separate lines.  In essence, 
this will be black's line-breaking strategy, adapted for tokens, not parse 
trees.  The code is likely to significantly simpler than black's.

*Summary*

Imo, it's reasonable to add black-like line breaking to Leo's existing 
beautify commands. The result should be much faster than black. We'll have 
full control over the sources, and license to add settings rather than 
argue about preferences ;-)

All comments and questions are welcome.

Edward

P.S.  One of the most tricky parts of the token-based code is handling 
backslash-newlines.  I don't remember the details.

It may be possible to follow black's lead and (optionally!) delete 
backslash-newlines, relying on the line-breaking algorithm to fix things 
up.  This may require adding parentheses in some cases.  It remains to be 
seen whether token-based "back parsing" is up to the job.

EKR

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/48192e7f-a784-489f-b43b-5550590870ce%40googlegroups.com.

Reply via email to