Re: ENB: The unification of the token and ast worlds

Edward K. Ream Mon, 11 Nov 2019 14:11:06 -0800

On Sun, Nov 10, 2019 at 3:37 AM Edward K. Ream <[email protected]> wrote:


> A few days a new goal appeared: to define and create a *token-order tree 
traversal*.

A status report.

I have been obsessed with this project.  Everything else is taking a back 
seat, including sleep :-)  I'll announce Leo 6.1 final soon, I promise :-)

This the most consequential, project I have done outside of Leo in the last 
20 years.  It surely is feasible, but it is a juicy, non-trivial, 
fascinating, programming problem.

After about 10 hours of work, starting very early this morning, I realized 
that my initial approach to "syncing" tokens with ast nodes needed a 
rethink. The initial idea was to *verify *that tokens matched ast nodes.  
But that's too late.

After a nap I saw that the ast visitors should use the "ground truth" (the 
tokens themselves) to ensure that tokens match the tree *when each ast node 
is visited.*  To do this, the ast visitors will call two *conditional *token 
generators, *put_conditional_comma* and *put_conditional_newline*. The 
former is needed because tuples with more than one element may be followed 
by an *optional* comma. The latter is needed because of difficulties 
handling "newline" and "indent" tokens. I anticipated both problems.

These conditional methods have access to *all* tokens in the list of input 
tokens, not just the "current" token, self.tokens[self.token_index].  So 
these conditional generators should never need to "guess", and neither will 
the ast node visitors that call them.

It is my present opinion that the tree must be traversed in a single pass.  
If it can't done properly from the get go, it can't be done at all.  But 
surely it *can* be done.  All necessary data are present.

This will be a spectacularly powerful tool.  For example, the test runner 
shows the results as follows:
 
print('Result...\n')
print(''.join([z.to_string() for z in tot.results]))

This suggests that we could define a TokenOrderFormatter class that 
*replaces* the existing AstFormatter class.  The code (completely untested) 
would look something like this:

class TokenOrderFormatter (TokenOrderTraverser):
    
    def format(contents):
        """
        Format the tree into a string guaranteed to be generated in token 
order.
        """
        self.tokens = self.make_tokens(contents)
        tree = parse_ast(contents)
        self.visit(tree)
        return ''.join([z.to_string() for z in tot.results])

The base class will do *all* the work!  The converse is not true.  There is 
no way that the TokenOrderTraverser class could be a subclass of the 
present AstFormatter class.  

*Summary*

This is one of the best, juiciest, most consequential, programming 
challenges ever.  Conditional token generators (the put* methods) are the 
next pieces of the puzzle. They might be the last pieces.  We shall see.

The TokenOrderTraverser class is a re-imagining of the AstFormatter class.  
A trivial, *rigorous* formatter could be built on top of 
TokenOrderTraverser.

As an experiment, I'll be rewriting Leo's fstringify commands, basing them 
on TokenOrderTraverser.  The idea is avoid token-level "parsing" entirely.  
This experiment will show whether this class is truly as revolutionary as I 
think it is :-)

Edward

P.S.  Development has been very easy.  My test runner is an @command node. 
The contents, though short and simple, suffice to drive development. They 
have revealed lots of bugs and larger problems.  Here is the test runner:

import imp
import leo.core.leoAst as leoAst
imp.reload(leoAst)

def check(contents, tokens):
    result = ''.join([z.to_string() for z in tokens])
    ok = result == contents
    if not ok:
        g.printObj(result)
    return ok

# This is more than enough to test syncing.
contents = r'''
class TestClass:
    def test(a, b=2):
        if a:
            print('hi')
            pass
        print('done')
'''.strip() + '\n'

print('Ctrl-2: leoAst tests...\n')
print('Contents...\n')
print(contents)
tot = leoAst.TokenOrderTraverser()
tokens = tot.make_tokens(contents)
ok = check(contents, tokens)
if ok:
    tree = leoAst.parse_ast(contents)
    print('Tree...\n')
    print(leoAst.AstDumper().dump(tree))
    print('')
    try:
        tot.verify_token_order(tokens, tree)
    except AssertionError as e:
        print(e)
    print('Result...\n')
    print(''.join([z.to_string() for z in tot.results]))
    print('')
    # tot.report_coverage(report_missing=False)
print('Ctrl-2:', ('PASS' if ok else 'FAIL'))
print('')

EKR

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/f199c8a3-5513-4f5d-902f-2d1c38005a8b%40googlegroups.com.

Re: ENB: The unification of the token and ast worlds

Reply via email to