On Sat, Oct 26, 2019 at 5:02 PM Matt Wilkie <[email protected]> wrote:

>
> I am ever more convinced that using tokens is superior to parse trees for
>> text munging.
>>
>
> Text munging I understand, but not parse trees and tokens. Can you give a
> one or two sentence overview?
>

*Parse trees*

See python's ast <https://docs.python.org/3/library/ast.html>module.  A
parse *tree *is a data structure representing the program's "abstract"
structure.  Parse trees allow easy analysis of a program's meaning.

To my knowledge, there is no simple, efficient, way of recovering
whitespace data from parse trees. I have given this question considerable
attention. See the TokenSync class in leoAst.py. The ast.get_source_segment
method (new in Python 3.8) is utterly feeble, and mind-bogglingly slow.

*Tokens*

See python's tokenize <https://docs.python.org/3/library/tokenize.html>
module. A token list is a *linear* list of the tokens
<https://docs.python.org/3/reference/lexical_analysis.html> that make up a
program.

Alas, tokens do not represent inter-token whitespace *directly*. Happily,
the tokenize module's Untokenizer class shows how to recover inter-token
whitespace.

*Summary*

For text munging, like black and fstringify, my experience shows that it is
easier to "parse" a list of tokens than to recover token-related data from
parse trees.

Imo, devs typically overestimate the difficulties involved in using tokens,
and underestimate the difficulties involved in using parse trees. The proof
is in the source code for black (horrendous), the "real" fstringify
(complex and still buggy), and my own fstringify, in the FstringifyTokens
in leoBeautify.py (fstring branch), which works pretty well after two days
of work.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/CAMF8tS1cAQmEr01%3Dk0zUMC622V4-fm8nzfinVQK-1h8VfL%3D1AA%40mail.gmail.com.

Reply via email to