ENB: Wow: the entire project will be nothing but two parallel generators

Edward K. Ream Sat, 23 Nov 2019 02:46:50 -0800

Neither the results list nor the Linker class is needed! This is a stunning 
collapse in complexity.


The *tree generator* already exists.  It consists of all the tree visitors 
in the TOG class, and their helpers.  The tree generator will stay exactly 
as it is, except for slight mods to tog.do_If visitor. The tree generator 
visits each tree node in token order.

The following will define the* token generator*:

self.token_gen = filter(is_significant, tokens)

*Aha*!  The two generators are *completely parallel*.  *At present*, the 
tree generator calls tog.put to generate tokens in the results list.  But, *at 
present*, tog.put generates *significant *result tokens in *exactly* the 
same order as does the token generator shown above.  

This was staring me right in the face.  Linker.assign_links contains:

sig_tokens = list(filter(is_significant, tokens))
sig_results = list(filter(is_significant, results))
for r, t in zip(sig_results, sig_tokens):
    self.set_links(r, t, tokens)

I thought I needed to convert the generators to lists, but yesterday I saw 
that it was hopeless to add or subtract tokens from the results list. (See 
below). So the linker *could *have been:

sig_tokens = filter(is_significant, tokens)
sig_results = filter(is_significant, results)
for r, t in zip(sig_results, sig_tokens):
    self.set_links(r, t, tokens)

Do you see?  Linker.set_links *only sees significant tokens.*  It could 
migrate to tog.put!  This means that tog.put can call a new* tog.sync* 
method to the token generator whenever it sees a significant token.  
Nothing else is needed!  Furthermore, tog.sync will contain just a few 
lines of code, very similar to Linker.set_links.


*History of the Aha*

Early yesterday I wrote: "I thought I had found a brilliant solution to 
some [problems], but serious doubts arose while writing this post, so I'll 
investigate further..."

The initial idea was to sync only 'string' tokens.  This was to be done in 
the do_Str visitor.  The "serious doubts" were whether 'string' tokens 
where significant.  It turns out that 'string' tokens *are* significant.  
In fact, only 'comment' tokens generate non-whitespace text but are missing 
from the parse tree.  Remember too that comma and parentheses must be 
insignificant because the parse tree doesn't contain enough data to match 
them exactly with the token list.

Two days ago I had already glimpsed the potential of this pattern.  If we 
can sync string tokens this way, why not do the same with all statement 
visitors?  And yes, I dared hope that the same could be done with *all* 
visitors.

Yesterday afternoon I returned to the problem of distinguishing 'elif' from 
'else' followed by 'if' in the do_If visitor. First, I convinced myself, 
once again, that the parse trees are identical for these two examples:

if 1:           if 1:
    pass            pass
else:           elif 2:
  if 2:             pass
      pass

Sad but true. So the do_If visitor needs help.

My first thought was to change 'else' to 'elif' as need in the Linker.  But 
this fails badly.  If do_If mistakenly generates "else" tokens, it will put 
a colon in the wrong place, namely *before* an arbitrarily long serious of 
tokens for the "if" condition, rather than *after* the condition.  This 
pretty much dooms the Linker class, but I wasn't to know that immediately.

So it was time for the heavy guns.  First, I added the following to 
tog.create_links, the main line of the TOG class:

self.if_list = [z for z in self.tokens
    if z.kind == 'name' and z.value in ('if', 'elif', 'else')]
self.if_list_index = 0

Second, I rewrote the do_If visitor as follows:

def do_If(self, node):
    << How to disambiguate between 'elif' and 'else' followed by 'if' >>
    # If or elif line...
        # if %s:\n
        # elif %s: \n
    if_value = self.if_list[self.if_list_index].value
    self.if_list_index += 1
    assert if_value in ('if', 'elif'), if_value
    yield from self.gen_name(if_value)
    yield from self.gen(node.test)
    yield from self.gen_op(':')
    yield from self.gen_newline()
    # Body...
    self.level += 1
    yield from self.gen(node.body)
    self.level -= 1
    # Else and elif clauses...
    if node.orelse:
        self.level += 1
        if_value = self.if_list[self.if_list_index].value
        if if_value == 'else':
            # Consume one if-list entry.
            self.if_list_index += 1
            yield from self.gen_name('else')
            yield from self.gen_op(':')
            yield from self.gen_newline()
            yield from self.gen(node.orelse)
        else:
            # Sanity checks.
            assert if_value in ('if', 'elif'), if_value
            node1 = node.orelse[0]
            assert isinstance(node1, ast.If), repr(node1)
            # Don't consume an if-list entry here.
            # Call ourselves recursively.
            yield from self.gen(node.orelse)
        self.level -= 1

Instantly, all my unit tests succeeded.

That was the end of a long day's work.  However, as I was talking to 
Rebecca about the success, all the pieces fell into place.

I'll never forget it.  I was brushing my teeth and saying wow.  A pause.  
Another wow.  Another pause.  Another wow.

I realized that the "if-list" should be a generator, that a token generator 
could *yield all* *significant *tokens, and that the tree and token 
generators would work in parallel.

I explained it to Rebecca as follows. I held both arms over my head.  My 
right hand represented the tree generator.  It moved *back and forth* as it 
moved from top to bottom, simulating traversing the tree.  My left hand 
represented the token generator.  It moved *straight dow*n, because it was 
just moving from one token to the next in the token list. Crucially, the 
*height 
*of the two hands always matched. The generators work in parallel.

*Implementation details*

It's just a bit tricky to look ahead in a generator.  So instead of the 
code shown above in do_If, the visitor call self.peek() and 
self.advance(),  defined something like this (completely untested):

# Use lists for debugging, generators for production.

peek_list = [] # Must be a list.

def peek(self):
    if instance(self.token_gen, list):
        return self.token_gen[self.token_gen_index]
    if not self.peek_list:
        self.peek_list = [self.token_gen.next] # or is it next()?
    return self.peek_list[-1]
    
def advance(self):
    if instance(self.token_gen, list):
        self.token_gen_index += 1
    self.peek_list = []

You get the idea. Finally, tog.sync will maintain a list of *insignificant* 
tokens it encounters while syncing to the next *significant* token. Should 
be no problem.

Oops.  tog.sync will need to see *all* the tokens in the token list, so it 
will need a generator that yields all tokens.  It's no big deal.

*Summary*

Ahas such as this happen rarely, and only as the result of many long hours 
of experimentation.

Last night's Aha justifies all the work I have done on this project.

The tog.sync method could "reassign" parentheses as needed, and can assign 
the MIA 'comment' tokens.  Alternatively, a post-pass can do the same, 
working *only* on the tokens list, which will have links from each 
significant token to the corresponding tree node.

Besides the tree generator, *three* token generators will probably be 
needed.  The "if" visitor still needs an if generator, and token sync will 
need a generator yielding *all* tokens. A generator that yields only 
significant tokens will probably also be needed.

I've said all along that speed doesn't matter much, but the new code will 
be spectacularly fast.

I'm way short of sleep.  That's all for now.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/0e65df2a-9ff9-4854-bc00-7feb784829e3%40googlegroups.com.

ENB: Wow: the entire project will be nothing but two parallel generators

Reply via email to