This Engineering Notebook post discusses how the Orange class might use the two-way links that tog.init_from_file creates between tokens (the list of input tokens) and tree (the tree of parse nodes).
This post is a bit short on explanation. It is primarily for my own use. Feel free to ignore. *Background* The Orange class implements Leo's new beautifier. (Orange is the new black). The Orange class is based on the now-retired PythonTokenBeautifier class. At present, the code uses *no* tree-related data. The Orange class is a stand-alone class. It does, however, *use* the TOG class as follows: tog = TokenOrderGenerator() contents, encoding, tokens, tree = tog.init_from_file(filename) At present, the code uses only the encoding and tokens values. *The legacy code almost suffices* Just as with the Fstringify class, the present code could be used as it is. The present code already does a good-to-excellent job of regularizing whitespace. The only significant improvement would be to use the parse tree to analyze the context of tokens. At present, the code uses *token-based state vars*. This looks like a dubious scheme. In fact, it is surprisingly sound. In particular, "name" tokens for keywords are guaranteed to *be* keywords. Ditto for op tokens representing parens and curly and square brackets. Tokens "hide" the contents of strings and comments, so there is no possibility of confusion. *The big questions* 1. To what extent would using the parse tree simplify state analysis? Four "input token handlers" contain "if" statements that depend lexical/parse state. About 10 "output token handlers" contain similar tests. I'll investigate what the code would look like if the token-based state vars were replaced by an analysis of the parse tree corresponding to recent tokens. 2. Will token pointers be useful when analyzing the list of output tokens? Unlike the TOG class, the Orange class uses *two* lists of tokens. TOG.init_from_file creates the *input token list*. The input node handlers then create a separate *output token list*. Having two token lists is convenient, because the output token handlers may delete or change output tokens after they are first created. In essence, output token handlers form a very fast peephole optimizer. This peephole only looks backward, never forward. Alas, pointers only exist only between the tree and the *input* token list, so some new invention is needed. 1. Orange.add_token creates output tokens. It could copy the token.node field from input tokens to the output tokens. 2. Links from the tree to tokens might not be needed. If they are needed, it will probably be easy enough to get the required data either from the input token list (as at present) or in some other fairly straightforward way. *Splitting and joining lines* The only remaining task is to split and join lines, as black does. I plan to do this in a separate post pass on the output token list. This will simplify the code, provided that all needed data are available. Black uses a horribly complex scheme to determine the length of lines. Instead, it will be much easier to call the global function tokens_to_string for the tokens comprising one output line. This will be straightforward. The present code contains a prototype of splitting and joining tokens. It is probably necessary to split tokens based on data from the parse tree. Indeed, the old code will fail if there the to-be-split lines do not lie between parens. A parse-tree-based version could look up the tree, looking for top-level statements. Parens could then be inserted based on the type of statement. For example: a = << very long RHS >> could be split into: a = ( << lines split by meaning >> ) A similar analysis could be used for other kinds of statements. *Summary* Using two-way links in the Orange class presents new challenges because the Orange class creates *two* tokens lists. Splitting lines properly requires a parse-tree-based analysis of the to-be-split lines. Joining lines is easier, but it probably also requires a proper analysis of the parse tree. Completing the Orange class will provide the last necessary "road test" of the TOG class. Edward -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/leo-editor/499c3602-fa57-49aa-a01b-dad205c16547%40googlegroups.com.
