Re: Please review the new docs for #1440

Edward K. Ream Wed, 15 Jan 2020 00:57:59 -0800

On Tue, Jan 14, 2020 at 7:06 PM Brian Theado <[email protected]> wrote:


In the theory of operation:
>
> "The notion of a *token order traversal* of a parse tree is the
> foundation of this project"
>
> In contrast, what traversal order do parse trees provide?
>

None whatever. Traversals are defined by code, not by the tree itself.

The ast module is particularly deficient in this regard. The documentation
for  ast.walk <https://docs.python.org/3/library/ast.html#ast.walk> is:

" Recursively yield all descendant nodes in the tree starting at *node*
(including *node* itself), in no specified order. This is useful if you
only want to modify nodes in place and don’t care about the context."

Hmm. This could be one motivating example. The TOG class inserts
parent/child links, and TOT.traverse(tree) *is* the token order traversal.
So a much more valuable version of ast.walk would be:

tog = TokenOrderGenerator()
tot = TokenOrderTraverser()
contents, encoding, tokens, tree = tog.init_from_file(filename)
tot.traverse(tree)

How is token order different/better? What does it allow me to do that I
> can't otherwise do with parse trees?
>

Great question. Perhaps I won't need a separate post after all. Here is a
long answer, which boiled down must become part of both the announcement
and the regular docs.

Recall that the python issue deals with deficiencies in ast-related tools.
The opening comment of that issues says: "the built-in AST does not
preserve comments or whitespace;"

This is only a small part of the problem facing anyone who wants to write a
program like fstringify or black:

1. The data in the parse tree does not preserve the *spelling* of comments
and strings. Why, I don't know, but that can't be helped. ast.parse creates
the initial parse trees, and ast.parse can't change in any way because the
ast module is cast in stone.

2. In contrast, the token list is what I have been calling the "ground
truth" of a program. Comment and string tokens *do* preserve spelling. It
is straightforward to recreate the program from the token list. That's what
the tokens_to_string function (in leoAst.py) does.

3. There is, in principle, no *short*, *easy *way to associate tokens with
ast nodes. The TOG class does this in what I firmly believe is the
simplest, clearest possible code. But the TOG class is far from short and
easy.

So the *first *answer your question is: a token order traversal is what
make the TOG class possible.

But why is the TOG class *itself *valuable? What can devs do with it that
they can't already do?

The TOG class inserts links between ast nodes and between nodes and tokens.
These links are what TOG does, and nothing else.

But now you ask, what good are these links? This is what I've never
properly explained.

The injected links will be useful for any tool that wants to modify python
source code. fstringify and black are two most prominent examples.  Now we
come to the real motivation. This is the "hole" in the documentation I have
been talking about.

*Any* tool that wants to modify python text will benefit from having *both*
a token-level view of the text *and* a parse-tree level view of the text.
The asttokens package provides this dual view, but only for top-level
python statements. In effect, the TOG class is a much simpler
implementation of the asttokens package.

This suggests that some wrapper functions, similar/identical to those in
the asttokens package, would be useful.

But I digress. Let me explain why the dual view (tokens *and* ast nodes) is
useful. This is something I've never explained because I started the
project knowing the answer.

*Tokens preserve linear text order. Parse tree define the meaning of those
tokens.*

Mostly, tools like fstringify and black will want to work at the token
level, because that is, or *should be* the most natural way to modify text:
just insert or delete the proper tokens.

Alas, at present, *the fstringify and black tools work at the parse tree
level*, despite *enormous *difficulties in doing so, because sometimes
those tools *must* have access to the meaning provided by the parse trees.

Example 1: (Fstringify) Potential f-strings are found by looking for a
ast.Binop node of a special form: the LHS of the Binop must be a string,
the RHS of the Binop must represent the one or more % specs in the LHS
string.

Example 2: (Black) When splitting a long line, black must analyze the
*meaning* of the corresponding line in significant detail. It needs to do
this because in some cases black must insert parens which did not exist
previously in the program. For example:

a = << a very very long line, possibly continued by the backslash newline
convention >>

black will convert this to:

a = (
   line 1
   line 2...
)

where none of lines line1, line 2, etc contain backslash newlines.

*At present, both the fstringify and black tools are stuck in the "ast
ghetto".*

Much of what these tools do would be much much easier if the token view of
an ast node were available.  For example, black uses a horrendously
complicated auxiliary traversal in order to determine the required line
length of an ast node!  But if token is the first token of the line,
token.line is the physical line containing that token! No problem at all!

*The TOG class would radically simplify both fstringify and black by
allowing those tools to use token views where appropriate.*

The Fstringify class in leoAst.py already demonstrates this. The Orange
class will demonstrate how to collapse the complexity of black.

"This help is essential, because the following two source files generate
> identical parse trees!"
>
> This implies to me that with a parse tree you can't do a round trip from
> source code, to a parse tree and back to the original source code. Is that
> correct?
>

In practice, you are correct.  ast.get_source_segment
<https://docs.python.org/3/library/ast.html#ast.get_source_segment> is new
in Python 3.8, and there is also ast.fix_missing_locations
<https://docs.python.org/3/library/ast.html#ast.fix_missing_locations>. But
these are hacks, and are very slow. Even with these functions, associating
tokens with ast nodes is messy.

In contrast, round-tripping is a snap with the TOG class.

If so is this just one externally visible benefit to your library or is it
> the main benefit? If it is the main benefit, then I think it should be made
> much more clear earlier in the document like in the Overview near here:
>
> "These links promise to collapse the complexity of any code that changes
> text, including the asttokens <https://pypi.org/project/asttokens/>,
> fstringify <https://pypi.org/project/fstringify/>, and black
> <https://pypi.org/project/black/> projects."
>

I agree. My long answer must be boiled down.

something like:
>
> "These links allow portions of python source code to be transformed while
> leaving the rest untouched. Much of the complexity of the asttokens
> <https://pypi.org/project/asttokens/>, fstringify
> <https://pypi.org/project/fstringify/>, and black
> <https://pypi.org/project/black/> projects comes from not being able to
> link between the text and structure of the Python code.  Using the code of
> this library can collapse the complexity of these and any projects which
> change Python text"
>

Not bad! My long answer explains in more detail why what you say is true.
I'll have to think about all this. There's no rush.

Many thanks for asking great questions.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/CAMF8tS2Jur%2BLAQB0JHid%3DYFB-wt9Z2rV5K6D9ds4SQTisspYVw%40mail.gmail.com.

Re: Please review the new docs for #1440

Reply via email to