A small pause for a better fstringify

Edward K. Ream Fri, 25 Oct 2019 03:51:29 -0700

In less than 30 hours I have been able to create a prototype for a simpler 
fstringify utility.  It's in the fstring branch, in the FstringifyTokens 
class in leoBeautify.py. This class is a subclass of the 
PythonTokenBeautifier class.  As the names imply, these classes work on 
tokens not on ast's (parse trees).

It is my strong (informed) opinion that parse trees are inappropriate for
text-based manipulations such as black <https://github.com/psf/black>and
fstringify <https://github.com/jacktasia/fstringify>. One has only to study
the code for black and fstringify, as I have, to see the outrageous
complexities involved in trying bend parse trees to one's will.

*Status*

The new code is straightforward and fast. The base class is a one-pass
peephole optimizer. Such things are surprisingly easy to get right. There
are hardly any "if" statements involved. The new code overrides only a
single method of the base class, do_string. This code looks ahead, beyond
the original token, to parse the arguments following the "%" operator. It
then consumes all the scanned input tokens and generates a single output
token representing the new f-string.

Token-based "parsing" of what follows the '%' is complete. It's a
straightforward scanner for "operands" that handles nested parens and
curly/square brackets via recursive descent. It's an easy page of code. It
could be an assignment in a beginning programming course.

The remaining work involves the following:

1. Parsing python's string formatting minilanguage
<https://docs.python.org/3/library/string.html#formatspec>. The present
regex needs more work.

2. Converting the legacy format specifiers to pep 498
<https://www.python.org/dev/peps/pep-0498/> form. A bit tricky, but should
be doable without type inference :-)

*A bonus*

During my review of Leo's original beautifier, I discovered that the
"raw_token" field was badly misnamed. In fact, it contains the exact line
containing the token! This is exactly what is needed to do black-like line
breaking/joining! It should be so easy to do, if I ever get around to it.

*Summary*

Parsing tokens is surprisingly easy. Token based approaches naturally
retain the essential features of text, including original spellings of
strings, line breaks, and original whitespace. Imo, both black and
fstringify could be much improved and simplified by using a token-based
approach.

The new fstringify-file command will beautify as well as fstringify.
There's no easy way *not* to beautify the file.

This is one of those all-consuming projects. It should reach a stopping
point in a day or three. I shall then release 6.1b1.

Edward

--
You received this message because you are subscribed to the Google Groups
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/leo-editor/f8e5cc3d-45b8-454f-994a-bcb10162244e%40googlegroups.com.

A small pause for a better fstringify

Reply via email to