In less than 30 hours I have been able to create a prototype for a simpler 
fstringify utility.  It's in the fstring branch, in the FstringifyTokens 
class in leoBeautify.py. This class is a subclass of the 
PythonTokenBeautifier class.  As the names imply, these classes work on 
tokens not on ast's (parse trees).

It is my strong (informed) opinion that parse trees are inappropriate for 
text-based manipulations such as black <https://github.com/psf/black>and 
fstringify <https://github.com/jacktasia/fstringify>. One has only to study 
the code for black and fstringify, as I have, to see the outrageous 
complexities involved in trying bend parse trees to one's will.

*Status*

The new code is straightforward and fast.  The base class is a one-pass 
peephole optimizer.  Such things are surprisingly easy to get right.  There 
are hardly any "if" statements involved. The new code overrides only a 
single method of the base class, do_string. This code looks ahead, beyond 
the original token, to parse the arguments following the "%" operator.  It 
then consumes all the scanned input tokens and generates a single output 
token representing the new f-string.

Token-based "parsing" of what follows the '%' is complete.  It's a 
straightforward scanner for "operands" that handles nested parens and 
curly/square brackets via recursive descent. It's an easy page of code. It 
could be an assignment in a beginning programming course.

The remaining work involves the following:

1. Parsing python's string formatting minilanguage 
<https://docs.python.org/3/library/string.html#formatspec>. The present 
regex needs more work.

2. Converting the legacy format specifiers to pep 498 
<https://www.python.org/dev/peps/pep-0498/> form. A bit tricky, but should 
be doable without type inference :-)

*A bonus*

During my review of Leo's original beautifier, I discovered that the 
"raw_token" field was badly misnamed. In fact, it contains the exact line 
containing the token!  This is exactly what is needed to do black-like line 
breaking/joining! It should be so easy to do, if I ever get around to it.

*Summary*

Parsing tokens is surprisingly easy.  Token based approaches naturally 
retain the essential features of text, including original spellings of 
strings, line breaks, and original whitespace. Imo, both black and 
fstringify could be much improved and simplified by using a token-based 
approach.

The new fstringify-file command will beautify as well as fstringify.  
There's no easy way *not* to beautify the file.

This is one of those all-consuming projects.  It should reach a stopping 
point in a day or three.  I shall then release 6.1b1.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/f8e5cc3d-45b8-454f-994a-bcb10162244e%40googlegroups.com.

Reply via email to