The git logs will show that I have been working night and day for the past month on the fstrings branch.
Yesterday I thought I had completed the next phase of the work. All files but one processed without complaint, which is significant because very strong checks are always present. However, the one failure involved the most complicated code in the project. After several hours of work in the wee hours this morning I went back to bed. Lying in bed I had a momentous Aha which will eliminate *all* the hard parts of the code! Let me explain. *Background* The only truly difficult task is determining how many tokens correspond to ast.JoinedStr nodes. This parse tree nodes are quite a mishmash. They represent at least one f-string *and* all other *concatenated *strings, whether f-strings or plain strings. The scheme that I have spent so much time attempts to determine, by looking at the JoinedStr node, which tokens correspond to the JoinedStr. This involves an extremely messy process that I call *reconciliation*, which munges the tree data to put it into *exact* correspondence with the next 'string' tokens. The following difficult methods are involved: advance_str, adjust_str_token, get_string_parts, scan_fstring, scan_string and the most difficult of all, get_joined_tokens. All of this is about to go away! *The Aha* We can determine which 'string' tokens are concatenated just by looking at the token list!!! Indeed, 'string' tokens are concatenated if and *only* if there are no significant tokens (including parens) between them! So *none* of the old correspondence/reconciliation machinery is needed. We can *ignore* the component ast nodes of the JoinedStr nodes completely and just use the token data. *Figures of merit* The code is already very fast. For example: leoGlobals.py len(sources): 286901 setup time: 0.61 sec. link time: 0.44 sec. The setup time is the time to tokenize the file and compile it to a parse tree. This involves two calls to python's standard library, so it is as fast as possible. The link time is the time to execute *all* the code in the TokenOrderGenerator class! It is already way faster than other tools. It will get a tad faster. Moreover, the TOG is both substantially simpler and more flexible than other tools. The Aha means that it will be very easy to debug and maintain. Finally, the TOG makes no significant demands on the GC. There are *no* large data structures involved, aside from the token list and the parse tree. The *only* variable-length data is a token stack. This will typically only have a few hundred entries. Python's run-time stack will have only a few entries, because generators eliminate all significant recursion. *Summary* Today's Aha is a big deal. *All* of the difficult parts of the code are about to disappear! The TOG will be easy to understand and maintain. It can now be adapted easily to handle other kinds of parse trees, such as pgen2/lib2to3. The TOG class is fast, simple, general and flexible. It promises to be an important tool in the python world. I'm proud of it. The last month's work is as close as I have ever come to working on a significant mathematical theorem. I guess I'll have to stop thinking of myself as a failed mathematician :-) Edward -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/leo-editor/b2a913fb-b429-4653-b8a2-e20d5e04a98d%40googlegroups.com.
