On Sat, Oct 26, 2019 at 5:02 PM Matt Wilkie <[email protected]> wrote:
> > I am ever more convinced that using tokens is superior to parse trees for >> text munging. >> > > Text munging I understand, but not parse trees and tokens. Can you give a > one or two sentence overview? > *Parse trees* See python's ast <https://docs.python.org/3/library/ast.html>module. A parse *tree *is a data structure representing the program's "abstract" structure. Parse trees allow easy analysis of a program's meaning. To my knowledge, there is no simple, efficient, way of recovering whitespace data from parse trees. I have given this question considerable attention. See the TokenSync class in leoAst.py. The ast.get_source_segment method (new in Python 3.8) is utterly feeble, and mind-bogglingly slow. *Tokens* See python's tokenize <https://docs.python.org/3/library/tokenize.html> module. A token list is a *linear* list of the tokens <https://docs.python.org/3/reference/lexical_analysis.html> that make up a program. Alas, tokens do not represent inter-token whitespace *directly*. Happily, the tokenize module's Untokenizer class shows how to recover inter-token whitespace. *Summary* For text munging, like black and fstringify, my experience shows that it is easier to "parse" a list of tokens than to recover token-related data from parse trees. Imo, devs typically overestimate the difficulties involved in using tokens, and underestimate the difficulties involved in using parse trees. The proof is in the source code for black (horrendous), the "real" fstringify (complex and still buggy), and my own fstringify, in the FstringifyTokens in leoBeautify.py (fstring branch), which works pretty well after two days of work. Edward -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/leo-editor/CAMF8tS1cAQmEr01%3Dk0zUMC622V4-fm8nzfinVQK-1h8VfL%3D1AA%40mail.gmail.com.
