On Tue, Jan 14, 2020 at 7:06 PM Brian Theado <brian.the...@gmail.com> wrote:
In the theory of operation: > > "The notion of a *token order traversal* of a parse tree is the > foundation of this project" > > In contrast, what traversal order do parse trees provide? > None whatever. Traversals are defined by code, not by the tree itself. The ast module is particularly deficient in this regard. The documentation for ast.walk <https://docs.python.org/3/library/ast.html#ast.walk> is: " Recursively yield all descendant nodes in the tree starting at *node* (including *node* itself), in no specified order. This is useful if you only want to modify nodes in place and don’t care about the context." Hmm. This could be one motivating example. The TOG class inserts parent/child links, and TOT.traverse(tree) *is* the token order traversal. So a much more valuable version of ast.walk would be: tog = TokenOrderGenerator() tot = TokenOrderTraverser() contents, encoding, tokens, tree = tog.init_from_file(filename) tot.traverse(tree) How is token order different/better? What does it allow me to do that I > can't otherwise do with parse trees? > Great question. Perhaps I won't need a separate post after all. Here is a long answer, which boiled down must become part of both the announcement and the regular docs. Recall that the python issue deals with deficiencies in ast-related tools. The opening comment of that issues says: "the built-in AST does not preserve comments or whitespace;" This is only a small part of the problem facing anyone who wants to write a program like fstringify or black: 1. The data in the parse tree does not preserve the *spelling* of comments and strings. Why, I don't know, but that can't be helped. ast.parse creates the initial parse trees, and ast.parse can't change in any way because the ast module is cast in stone. 2. In contrast, the token list is what I have been calling the "ground truth" of a program. Comment and string tokens *do* preserve spelling. It is straightforward to recreate the program from the token list. That's what the tokens_to_string function (in leoAst.py) does. 3. There is, in principle, no *short*, *easy *way to associate tokens with ast nodes. The TOG class does this in what I firmly believe is the simplest, clearest possible code. But the TOG class is far from short and easy. So the *first *answer your question is: a token order traversal is what make the TOG class possible. But why is the TOG class *itself *valuable? What can devs do with it that they can't already do? The TOG class inserts links between ast nodes and between nodes and tokens. These links are what TOG does, and nothing else. But now you ask, what good are these links? This is what I've never properly explained. The injected links will be useful for any tool that wants to modify python source code. fstringify and black are two most prominent examples. Now we come to the real motivation. This is the "hole" in the documentation I have been talking about. *Any* tool that wants to modify python text will benefit from having *both* a token-level view of the text *and* a parse-tree level view of the text. The asttokens package provides this dual view, but only for top-level python statements. In effect, the TOG class is a much simpler implementation of the asttokens package. This suggests that some wrapper functions, similar/identical to those in the asttokens package, would be useful. But I digress. Let me explain why the dual view (tokens *and* ast nodes) is useful. This is something I've never explained because I started the project knowing the answer. *Tokens preserve linear text order. Parse tree define the meaning of those tokens.* Mostly, tools like fstringify and black will want to work at the token level, because that is, or *should be* the most natural way to modify text: just insert or delete the proper tokens. Alas, at present, *the fstringify and black tools work at the parse tree level*, despite *enormous *difficulties in doing so, because sometimes those tools *must* have access to the meaning provided by the parse trees. Example 1: (Fstringify) Potential f-strings are found by looking for a ast.Binop node of a special form: the LHS of the Binop must be a string, the RHS of the Binop must represent the one or more % specs in the LHS string. Example 2: (Black) When splitting a long line, black must analyze the *meaning* of the corresponding line in significant detail. It needs to do this because in some cases black must insert parens which did not exist previously in the program. For example: a = << a very very long line, possibly continued by the backslash newline convention >> black will convert this to: a = ( line 1 line 2... ) where none of lines line1, line 2, etc contain backslash newlines. *At present, both the fstringify and black tools are stuck in the "ast ghetto".* Much of what these tools do would be much much easier if the token view of an ast node were available. For example, black uses a horrendously complicated auxiliary traversal in order to determine the required line length of an ast node! But if token is the first token of the line, token.line is the physical line containing that token! No problem at all! *The TOG class would radically simplify both fstringify and black by allowing those tools to use token views where appropriate.* The Fstringify class in leoAst.py already demonstrates this. The Orange class will demonstrate how to collapse the complexity of black. "This help is essential, because the following two source files generate > identical parse trees!" > > This implies to me that with a parse tree you can't do a round trip from > source code, to a parse tree and back to the original source code. Is that > correct? > In practice, you are correct. ast.get_source_segment <https://docs.python.org/3/library/ast.html#ast.get_source_segment> is new in Python 3.8, and there is also ast.fix_missing_locations <https://docs.python.org/3/library/ast.html#ast.fix_missing_locations>. But these are hacks, and are very slow. Even with these functions, associating tokens with ast nodes is messy. In contrast, round-tripping is a snap with the TOG class. If so is this just one externally visible benefit to your library or is it > the main benefit? If it is the main benefit, then I think it should be made > much more clear earlier in the document like in the Overview near here: > > "These links promise to collapse the complexity of any code that changes > text, including the asttokens <https://pypi.org/project/asttokens/>, > fstringify <https://pypi.org/project/fstringify/>, and black > <https://pypi.org/project/black/> projects." > I agree. My long answer must be boiled down. something like: > > "These links allow portions of python source code to be transformed while > leaving the rest untouched. Much of the complexity of the asttokens > <https://pypi.org/project/asttokens/>, fstringify > <https://pypi.org/project/fstringify/>, and black > <https://pypi.org/project/black/> projects comes from not being able to > link between the text and structure of the Python code. Using the code of > this library can collapse the complexity of these and any projects which > change Python text" > Not bad! My long answer explains in more detail why what you say is true. I'll have to think about all this. There's no rush. Many thanks for asking great questions. Edward -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To unsubscribe from this group and stop receiving emails from it, send an email to leo-editor+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/leo-editor/CAMF8tS2Jur%2BLAQB0JHid%3DYFB-wt9Z2rV5K6D9ds4SQTisspYVw%40mail.gmail.com.