I said that all the work on #1440 has been valuable, even though a simple
script might use asttokens to do everything that the code in leoAst.py does.
This Engineering Notebook post explains why deep knowledge of the problem
domain was needed to get to the surprising script. This post also explains
some parts of the script in detail. As with all ENB posts, feel free to
ignore it.
At no time was I upset by the surprise. I immediately treated it as *good*
news. asttokens now provides a valuable point of comparison and context.
The work I have done has given me deep insights into the subtle,
behind-the-scenes, complications involved.
*Why did I, and black, and fstringify miss this possibility?*
In retrospect, it's clear why the Aha is easy to miss:
1. I didn't know until yesterday what data would be needed. It's impossible
to know what would work until you know exactly what data will be needed.
It's just all too confusing.
2. I have been assuming all along that *exact* traversal order would
(ultimately) be required. But that not at all true. Indeed, in some cases
*random* traversal suffices.
The Fstringify code in leoAst.py is an example. The ast.BinOp visitor would
work if visited in *any* order, because potential f-strings are disjoint.
However, we actually want the BinOp visitor to be visited in the
approximate source-code order those ops appear in the sources, because
Fstringify produces log messages, and we don't want *those* messages to be
scrambled ;-)
3. [The big one]. I have been assuming that an exact, 1-to-1,
correspondence between tokens and ast nodes is needed. Wrong, wrong, wrong!
We can tolerate many-to-many links between tokens and nodes. That is, many
nodes might point at a single token, and a single token might point at many
nodes.
This is what I saw yesterday while discussing links with Rebecca. Iirc, I
saw that crucial test in o.colon would work just fine with a many-to-many
mapping between tokens and nodes. I've shown this crucial code before.
Here it is again:
def colon(self, val):
"""Handle a colon."""
node = self.token.node
self.clean('blank')
if not isinstance(node, ast.Slice):
self.add_token('op', val)
self.blank()
return
# A slice.
[snip]
The Aha: yesterday I saw that the code:
if not isinstance(node, ast.Slice):
could be replaced by:
if not any(isinstance(z, ast.Slice) for z in self.token.node_list):
Let's see how token.node_list can be computed...
*The asttokens script*
First, we create a list of *mutable *Token objects. asttokens uses only
the named tuples provided by tokenize.tokenize. Named tuples are immutable,
so the script must create an auxiliary list. The Token class is simple. No
need to show it here.
atok = asttokens.ASTTokens(source, parse=True)
tokens = [Token(atok_name(z), atok_value(z)) for z in atok.tokens]
Given this list of Token objects, it's a snap to create the token lists:
for node in asttokens.util.walk(atok.tree):
for ast_token in atok.get_tokens(node, include_extra=True):
i = ast_token.index
token = tokens[i]
token.node_list.append(node)
That's all there is to it. It's also straightforward to inject parent/child
links into ast nodes. See the actual script for details.
*Summary*
It takes deep insight to realize that asttokens could replace the TOG and
TOT classes. This is the reason I was happy to see this possibility.
In any event, the TOG and TOT classes are still valuable. They are faster
and clearer (in most ways) than the asttokens code. Otoh, the asttokens
code could be said to be more clever. The new insights promise new ways to
simplify the code in leoAst.py using clever asttokens code.
Edward
--
You received this message because you are subscribed to the Google Groups
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/leo-editor/efae61fa-f3bb-4826-8f1f-446045545ae7%40googlegroups.com.