A few days a new goal appeared: to define and create a *token-order tree traversal*.This would be yet another kind of tree traversal <https://www.geeksforgeeks.org/tree-traversals-inorder-preorder-and-postorder/>, whose purpose would be to unify the token and ast worlds. This *goal* started a chain of discovery and experimentation. Iirc, this is the first time I have stated this goal.
This post <https://groups.google.com/d/msg/leo-editor/ZSo8_fW_cmA/Pfvq5KV0AQAJ> is a primer of the token and ast based worlds. Please study that post if have no idea what "unifying the token and ast worlds" might mean :-) Yesterday I spent many happy hours exploring an excellent, elegant, asttokens <https://github.com/gristlabs/asttokens> package. At first I thought asttokens might be an elegant way to define token-order traversals. But then problems arose... Last night, in conversation with Rebecca, I had two Ahas, each of which will forever change my view of the world: *Aha 1*. Any token-order tree traversal *must* be isomorphic to Leo's AstFormatter class, in leoAst.py. At last, token-order tree traversals are well defined. *Aha 2*. Tree-based code could be an alternative front-end to Leo's token-based beautifier. I am writing this post now, before the old world view becomes completely inaccessible. This post is an important part of Leo's history. It will also be pre-writing for a new theory of operation. *Background* Three previous threads record the background of the Ahas: *Thread 1*: October 25: A small pause for a better fstringify <https://groups.google.com/forum/#!topic/leo-editor/ZSo8_fW_cmA> states: "It is my strong (informed) opinion that parse trees are inappropriate for text-based manipulations such as black <https://github.com/psf/black>and fstringify <https://github.com/jacktasia/fstringify>". The two new Ahas alter that opinion in large and small ways. *Thread 2*: November 1: ENB about tokens and related commands <https://groups.google.com/d/msg/leo-editor/aivhFnXW85Q/b2a8GHvEDwAJ> reiterates my contention that python programmers often downplay the significance of token-based code. That's still my opinion. *Thread 3*: November 3: ENB A much better untokenizer <https://groups.google.com/d/msg/leo-editor/DpZ2cMS03WE/VPqtB9lTEAAJ> discusses as spectacular replacement for the untokenize function in tokenize.py, part of Python's standard library. The new code is the foundation of Leo's fstringify commands, and all other token-based code in Leo. The python devs responsible for tokenize.py were underwhelmed by the new untokenize, but no matter :-) It was part of the background for the new Ahas. And new untokenize is the foundation of Leo's new fstringify commands. *About asttokens* The asttokens package embeds new data into the 5-tuples created by Pythons tokenize function. They become 8-tuples. The new data contain links to the ast nodes "responsible" for the tokens. Alas, the new data does not suffice to create a two way mapping between tokens and ast nodes. Rebecca asked whether such a two-way mapping was possible, and both Aha's appeared immediately! Any proper *tree-to-token mapping* must have two parts: 1. Links from each token to *exactly one* tree node, the node that "generates" the token. 2. Links from tree nodes to zero or more tokens, in the proper token order. *Aha 1: clever code has no chance of working* I saw that I was trying to "cheat" yesterday. That is, I was trying to make asttokens do more than it possibly could. This was becoming clearer as I wrestled with the new TokenOrderTraverser class in leoAst.py. The present version of this class will move to the attic. There is no real need to discuss the problems in detail. Instead, let's just consider the AstFormatter class in leoAst.py. Aha1 is simply the realization that the AstFormatter class *already* defines token-order! If I want a two-way mapping between tokens and tree nodes, something that works exactly like the visitors in AstFormatter is not only the *simplest *thing that could possibly work, it is the *only* thing that could possibly work. Indeed, there is *exactly one* traversal that will format text properly. But (Aha!) the formatted text must be (except for easily handled special cases) isomorphic to the stream of tokens! For the first time, token-order is well defined. *Aha 2: AstFormatter could be the front end to Leo's beautify commands* This is a corollary to the first Aha. If a tree traversal can produce tokens in the same order that tokenize does, then it could create the so-called input tokens used by Leo's beautify and fstringify commands. *Strategy* I'll modify the AstFormatter class so that it injects two-way links between tokens and nodes. Aha1 proves that this is possible! The AstFormatter class contains one "visitor" function for every single ast node that could possibly generate output text. Crucially, these visitors *must* call other visitors in the correct order. That order *defines* token order. Instead of creating output text, the rewritten TokenOrderTraverser class will insert links. So simple. The answer was staring at me all this time. There are a few complications that are easily handled. Commas after tuples with two or more elements are optional. Therefore, the do_Tuple visitor* must test the token stream*. The old TokenOrderTraverser class tried to use a dict to specify token order. But *cruft tokens* doomed that approach. Examples: - The trailing colon in class and def statements. - The commas in lists and tuples. Only explicit code, *exactly* as in the visitors in AstFormatter, could possibly inject links into cruft tokens. *Summary* It's getting very late, so I'll be brief here. I don't remember exactly how notion of a token-order traversal appeared. In retrospect, this was a crucial "invention". Initially, all details were fuzzy. It wasn't entirely clear whether the notion *could* be well defined, though I strongly suspected that it could be. Now, everything is clear. The code for a token-order traversal class must be isomorphic to the code in AstFormatter class. All the messy, picky, details of that class *define* token-order traversals! Lest anyone doubt the importance of these Ahas, consider the status quo ante: - The old (deprecated) TokenSync class in leoAst.py. - The horrendous code in the "real" black and fstringify tools. - The token-level parsing in Leo's fstringify commands. A proper tree-to-token mapping would be of great value to any tool that munges text. It allows tools to use both tree and token representations interchangeably *in the same program.* That's all for now. I've covered enough here to make sure the crucial details behind the Ahas don't fade away. Edward -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/leo-editor/f8468fdb-91ea-48da-b847-0f22a75ec89e%40googlegroups.com.
