Neither the results list nor the Linker class is needed! This is a stunning
collapse in complexity.
The *tree generator* already exists. It consists of all the tree visitors
in the TOG class, and their helpers. The tree generator will stay exactly
as it is, except for slight mods to tog.do_If visitor. The tree generator
visits each tree node in token order.
The following will define the* token generator*:
self.token_gen = filter(is_significant, tokens)
*Aha*! The two generators are *completely parallel*. *At present*, the
tree generator calls tog.put to generate tokens in the results list. But, *at
present*, tog.put generates *significant *result tokens in *exactly* the
same order as does the token generator shown above.
This was staring me right in the face. Linker.assign_links contains:
sig_tokens = list(filter(is_significant, tokens))
sig_results = list(filter(is_significant, results))
for r, t in zip(sig_results, sig_tokens):
self.set_links(r, t, tokens)
I thought I needed to convert the generators to lists, but yesterday I saw
that it was hopeless to add or subtract tokens from the results list. (See
below). So the linker *could *have been:
sig_tokens = filter(is_significant, tokens)
sig_results = filter(is_significant, results)
for r, t in zip(sig_results, sig_tokens):
self.set_links(r, t, tokens)
Do you see? Linker.set_links *only sees significant tokens.* It could
migrate to tog.put! This means that tog.put can call a new* tog.sync*
method to the token generator whenever it sees a significant token.
Nothing else is needed! Furthermore, tog.sync will contain just a few
lines of code, very similar to Linker.set_links.
*History of the Aha*
Early yesterday I wrote: "I thought I had found a brilliant solution to
some [problems], but serious doubts arose while writing this post, so I'll
investigate further..."
The initial idea was to sync only 'string' tokens. This was to be done in
the do_Str visitor. The "serious doubts" were whether 'string' tokens
where significant. It turns out that 'string' tokens *are* significant.
In fact, only 'comment' tokens generate non-whitespace text but are missing
from the parse tree. Remember too that comma and parentheses must be
insignificant because the parse tree doesn't contain enough data to match
them exactly with the token list.
Two days ago I had already glimpsed the potential of this pattern. If we
can sync string tokens this way, why not do the same with all statement
visitors? And yes, I dared hope that the same could be done with *all*
visitors.
Yesterday afternoon I returned to the problem of distinguishing 'elif' from
'else' followed by 'if' in the do_If visitor. First, I convinced myself,
once again, that the parse trees are identical for these two examples:
if 1: if 1:
pass pass
else: elif 2:
if 2: pass
pass
Sad but true. So the do_If visitor needs help.
My first thought was to change 'else' to 'elif' as need in the Linker. But
this fails badly. If do_If mistakenly generates "else" tokens, it will put
a colon in the wrong place, namely *before* an arbitrarily long serious of
tokens for the "if" condition, rather than *after* the condition. This
pretty much dooms the Linker class, but I wasn't to know that immediately.
So it was time for the heavy guns. First, I added the following to
tog.create_links, the main line of the TOG class:
self.if_list = [z for z in self.tokens
if z.kind == 'name' and z.value in ('if', 'elif', 'else')]
self.if_list_index = 0
Second, I rewrote the do_If visitor as follows:
def do_If(self, node):
<< How to disambiguate between 'elif' and 'else' followed by 'if' >>
# If or elif line...
# if %s:\n
# elif %s: \n
if_value = self.if_list[self.if_list_index].value
self.if_list_index += 1
assert if_value in ('if', 'elif'), if_value
yield from self.gen_name(if_value)
yield from self.gen(node.test)
yield from self.gen_op(':')
yield from self.gen_newline()
# Body...
self.level += 1
yield from self.gen(node.body)
self.level -= 1
# Else and elif clauses...
if node.orelse:
self.level += 1
if_value = self.if_list[self.if_list_index].value
if if_value == 'else':
# Consume one if-list entry.
self.if_list_index += 1
yield from self.gen_name('else')
yield from self.gen_op(':')
yield from self.gen_newline()
yield from self.gen(node.orelse)
else:
# Sanity checks.
assert if_value in ('if', 'elif'), if_value
node1 = node.orelse[0]
assert isinstance(node1, ast.If), repr(node1)
# Don't consume an if-list entry here.
# Call ourselves recursively.
yield from self.gen(node.orelse)
self.level -= 1
Instantly, all my unit tests succeeded.
That was the end of a long day's work. However, as I was talking to
Rebecca about the success, all the pieces fell into place.
I'll never forget it. I was brushing my teeth and saying wow. A pause.
Another wow. Another pause. Another wow.
I realized that the "if-list" should be a generator, that a token generator
could *yield all* *significant *tokens, and that the tree and token
generators would work in parallel.
I explained it to Rebecca as follows. I held both arms over my head. My
right hand represented the tree generator. It moved *back and forth* as it
moved from top to bottom, simulating traversing the tree. My left hand
represented the token generator. It moved *straight dow*n, because it was
just moving from one token to the next in the token list. Crucially, the
*height
*of the two hands always matched. The generators work in parallel.
*Implementation details*
It's just a bit tricky to look ahead in a generator. So instead of the
code shown above in do_If, the visitor call self.peek() and
self.advance(), defined something like this (completely untested):
# Use lists for debugging, generators for production.
peek_list = [] # Must be a list.
def peek(self):
if instance(self.token_gen, list):
return self.token_gen[self.token_gen_index]
if not self.peek_list:
self.peek_list = [self.token_gen.next] # or is it next()?
return self.peek_list[-1]
def advance(self):
if instance(self.token_gen, list):
self.token_gen_index += 1
self.peek_list = []
You get the idea. Finally, tog.sync will maintain a list of *insignificant*
tokens it encounters while syncing to the next *significant* token. Should
be no problem.
Oops. tog.sync will need to see *all* the tokens in the token list, so it
will need a generator that yields all tokens. It's no big deal.
*Summary*
Ahas such as this happen rarely, and only as the result of many long hours
of experimentation.
Last night's Aha justifies all the work I have done on this project.
The tog.sync method could "reassign" parentheses as needed, and can assign
the MIA 'comment' tokens. Alternatively, a post-pass can do the same,
working *only* on the tokens list, which will have links from each
significant token to the corresponding tree node.
Besides the tree generator, *three* token generators will probably be
needed. The "if" visitor still needs an if generator, and token sync will
need a generator yielding *all* tokens. A generator that yields only
significant tokens will probably also be needed.
I've said all along that speed doesn't matter much, but the new code will
be spectacularly fast.
I'm way short of sleep. That's all for now.
Edward
--
You received this message because you are subscribed to the Google Groups
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/leo-editor/0e65df2a-9ff9-4854-bc00-7feb784829e3%40googlegroups.com.