I've just upgraded #1266 <https://github.com/leo-editor/leo-editor/issues/1266>. This is the *orange project*, to be done in the orange branch.
Imo, there are sound reasons for enhancing Leo's existing token-based beautifier commands: - They are simpler and faster than black. - Explicit tokens represent strings and comments, avoiding coding horror . - There is no need to argue with outsiders about philosophy or their code. - We can add any options we like. *Background* I spent several hours this morning trying, and utterly failing to alter black's code to do anything new. Basing a beautifier on ast's (parse trees) seems reasonable, but I know from long experience that python's ast's have big holes in them regarding strings and whitespace. The data is there, but in an almost-impossible-to-use form. There is nothing that black can do about this, except adopt occult hacks. *How Leo's beautifier commands work* Leo's token-based beautifier code is relatively easy to use and remember. I am writing this from memory, without having looked at the code. There are some complications, but these have already been handled. The code appears solid. Python's tokenize module *quickly *breaks the source into *input tokens*. Each input token has a *token reader*, which calls one or more *code generators*. Each code generator emits output tokens to the *output token list*. Code generators typically "look behind", examining the previously generated output tokens. Code generators may insert, delete or change already-generated tokens. In other words, code generators act like on-the-fly peephole optimizers. The algorithm is extremely fast, because readers and code generators are short and fast. Python's tokenizer module is very fast. I am free to add new kinds of output tokens if doing so makes life easier for token readers or code generators. *Splitting lines* The end-of-line (or is it start of line?) code generator will "look behind" (into the output token list) to see what tokens exist on the last line. Using those tokens, it will calculate the line's length, and find black-like places to break (the tokens!) into separate lines. In essence, this will be black's line-breaking strategy, adapted for tokens, not parse trees. The code is likely to significantly simpler than black's. *Summary* Imo, it's reasonable to add black-like line breaking to Leo's existing beautify commands. The result should be much faster than black. We'll have full control over the sources, and license to add settings rather than argue about preferences ;-) All comments and questions are welcome. Edward P.S. One of the most tricky parts of the token-based code is handling backslash-newlines. I don't remember the details. It may be possible to follow black's lead and (optionally!) delete backslash-newlines, relying on the line-breaking algorithm to fix things up. This may require adding parentheses in some cases. It remains to be seen whether token-based "back parsing" is up to the job. EKR -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/leo-editor/48192e7f-a784-489f-b43b-5550590870ce%40googlegroups.com.
