On Monday, November 11, 2019 at 4:10:39 PM UTC-6, Edward K. Ream wrote:

After about 10 hours of work, starting very early this morning, I realized 
> that my initial approach to "syncing" tokens with ast nodes needed a 
> rethink.​ The initial idea was to *verify *that tokens matched ast 
> nodes.  But that's too late.
>

*Note*:  this continues an Engineering Notebook post.  Feel free to ignore.

This post is a milestone.  It records notes to myself.  It also celebrates 
a clearing of my mental fog.  I want to write this as an important 
historical note.  I also record it so I can go back to sleep ;-)

When I awoke very early this morning I realized that I had been making 
things much more complicated than they need to be, because I had forgotten 
what I was trying to do :-)

The present work is not supposed to generate *new* tokens, it is supposed 
to *annotate the existing* tokens created by make_all_tokens.  This 
collapses the complexity of the crucial code.  Here are some principles 
that I awoke with:

1. Add fields to Token class: index, level, node.

The purpose of the TokenOrderTraverser is to add these links.  Nothing more!
  
2. Improve the visit method

- assert isinstance(node, ast.Ast) (None, list, tuple) are not valid.

Unlike all "elegant" traversal classes, the TokenOrderTraverser class *must* 
explicitly handle all fields that are lists or tuple, and must check for 
empty fields.  There is absolutely no choice about this.  It's the only way 
to retain the correct traversal order.

- Inject parent, ordered_children fields in ast nodes.

These are not needed to annotate the tokens, but they will be of great 
value for the clients of the TokenOrderTraverser class.

- compute max_level, max_stack_level.

These are important data for development.  max_level is the max indentation 
level of python blocks.  max_stack_level is the max recursion level in 
tot.visit.  The visit method can easily update these date.

The asttokens tool supposedly uses generators to avoid overflowing python's 
runtime stack.  I want to make sure we never come close to this.

3. The following three items look innocuous.  In fact, they are supremely 
important. They arise because the task is now clearer:

- Remove all calls to put_indent and put_dedent. Replace them with 
self.level +- 1.
- Remove all "speculative" calls to do_newline.
- Remove conditional_newline.

You could say that all of the code above is a brain spike :-)  Again, the 
code is *not* creating new tokens, it is annotating existing tokens.  This 
is actually a huge Aha:

   The put* methods simply "eat" zero or more tokens in the token *list*, 
adding fields to those tokens in the process.

Most put methods will eat "ws" tokens if they are next, and then eat the 
"matching" token. The put_newline method will *also* any following "indent" 
token.  It's totally simple!  There should be no such thing as a 
conditional_newline!

I'm not sure how "dedent" tokens will be eaten, but it shouldn't be a big 
deal to do so.

And now there is a second huge Aha:

    Eating a token naturally associates exactly one ast node with the token.

Indeed, the self.node (carefully set and restored in tot.visit, using a 
stack) will be injected into the token's node field.  That's all there is 
to it!

And one last Aha:

    Newlines are associated with *statements*, not blocks.

This ends some massive confusion, and will simplify the code considerably.

*Summary*

The task of the TokenOrderTraverser class is merely to annotate already 
existing tokens.

The put methods will "eat" one or more tokens by advancing a pointer to the 
token list and by injecting data into the eaten tokens.  There is no need 
for complex synchronization!

put_newline will eat a "newline" or "nl" token and any following "indent" 
token, and probably any preceding "dedent" token.  The 
put_conditional_comma method is still required.  It will eat a comma if it 
exists, but will issue no complaint if it does not.

Eating a token naturally associates a token with the correct ast node.  At 
last I clearly and fully understand the two-way correspondence.

The self.level ivar represents indentation level, and will be injected into 
all tokens. That's all that needs to be done regarding indentation! There 
is no need to generate "indent" and "dedent" tokens!

It is rare for a to-do list to have such import, but these are wonderful 
times :-)

Edward

P. S. Hehe. The TokenOrderFormatter is trivial because it doesn't do 
anything. True, a proper code beautifier or fstringifier would be a 
subclass of TokenOrderTraverser, but those tools would be anything but 
trivial.

EKR

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/cb73cb38-b404-4218-a18a-8605f70bce53%40googlegroups.com.

Reply via email to