In less than 30 hours I have been able to create a prototype for a simpler fstringify utility. It's in the fstring branch, in the FstringifyTokens class in leoBeautify.py. This class is a subclass of the PythonTokenBeautifier class. As the names imply, these classes work on tokens not on ast's (parse trees).
It is my strong (informed) opinion that parse trees are inappropriate for text-based manipulations such as black <https://github.com/psf/black>and fstringify <https://github.com/jacktasia/fstringify>. One has only to study the code for black and fstringify, as I have, to see the outrageous complexities involved in trying bend parse trees to one's will. *Status* The new code is straightforward and fast. The base class is a one-pass peephole optimizer. Such things are surprisingly easy to get right. There are hardly any "if" statements involved. The new code overrides only a single method of the base class, do_string. This code looks ahead, beyond the original token, to parse the arguments following the "%" operator. It then consumes all the scanned input tokens and generates a single output token representing the new f-string. Token-based "parsing" of what follows the '%' is complete. It's a straightforward scanner for "operands" that handles nested parens and curly/square brackets via recursive descent. It's an easy page of code. It could be an assignment in a beginning programming course. The remaining work involves the following: 1. Parsing python's string formatting minilanguage <https://docs.python.org/3/library/string.html#formatspec>. The present regex needs more work. 2. Converting the legacy format specifiers to pep 498 <https://www.python.org/dev/peps/pep-0498/> form. A bit tricky, but should be doable without type inference :-) *A bonus* During my review of Leo's original beautifier, I discovered that the "raw_token" field was badly misnamed. In fact, it contains the exact line containing the token! This is exactly what is needed to do black-like line breaking/joining! It should be so easy to do, if I ever get around to it. *Summary* Parsing tokens is surprisingly easy. Token based approaches naturally retain the essential features of text, including original spellings of strings, line breaks, and original whitespace. Imo, both black and fstringify could be much improved and simplified by using a token-based approach. The new fstringify-file command will beautify as well as fstringify. There's no easy way *not* to beautify the file. This is one of those all-consuming projects. It should reach a stopping point in a day or three. I shall then release 6.1b1. Edward -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/leo-editor/f8e5cc3d-45b8-454f-994a-bcb10162244e%40googlegroups.com.
