The git logs will show that I have been working night and day for the past 
month on the fstrings branch.

Yesterday I thought I had completed the next phase of the work. All files 
but one processed without complaint, which is significant because very 
strong checks are always present.

However, the one failure involved the most complicated code in the project. 
After several hours of work in the wee hours this morning I went back to 
bed. Lying in bed I had a momentous Aha which will eliminate *all* the hard 
parts of the code!  Let me explain.

*Background*

The only truly difficult task is determining how many tokens correspond to 
ast.JoinedStr nodes. This parse tree nodes are quite a mishmash. They 
represent at least one f-string *and* all other *concatenated *strings, 
whether f-strings or plain strings.

The scheme that I have spent so much time attempts to determine, by looking 
at the JoinedStr node, which tokens correspond to the JoinedStr. This 
involves an extremely messy process that I call *reconciliation*, which 
munges the tree data to put it into *exact* correspondence with the next 
'string' tokens. The following difficult methods are involved:  
advance_str, adjust_str_token, get_string_parts, scan_fstring, scan_string 
and the most difficult of all, get_joined_tokens.

All of this is about to go away!

*The Aha*

We can determine which 'string' tokens are concatenated just by looking at 
the token list!!!

Indeed, 'string' tokens are concatenated if and *only* if there are no 
significant tokens (including parens) between them!

So *none* of the old correspondence/reconciliation machinery is needed. We 
can *ignore* the component ast nodes of the JoinedStr nodes completely and 
just use the token data.

*Figures of merit*

The code is already very fast. For example:

leoGlobals.py
len(sources): 286901
  setup time: 0.61 sec.
   link time: 0.44 sec.

The setup time is the time to tokenize the file and compile it to a parse 
tree. This involves two calls to python's standard library, so it is as 
fast as possible.

The link time is the time to execute *all* the code in the 
TokenOrderGenerator class! It is already way faster than other tools. It 
will get a tad faster.

Moreover, the TOG is both substantially simpler and more flexible than 
other tools.  The Aha means that it will be very easy to debug and maintain.

Finally, the TOG makes no significant demands on the GC. There are *no* 
large data structures involved, aside from the token list and the parse 
tree. The *only* variable-length data is a token stack. This will typically 
only have a few hundred entries. Python's run-time stack will have only a 
few entries, because generators eliminate all significant recursion.

*Summary*

Today's Aha is a big deal. *All* of the difficult parts of the code are 
about to disappear! The TOG will be easy to understand and maintain.  It 
can now be adapted easily to handle other kinds of parse trees, such as 
pgen2/lib2to3.

The TOG class is fast, simple, general and flexible. It promises to be an 
important tool in the python world. I'm proud of it.

The last month's work is as close as I have ever come to working on a 
significant mathematical theorem. I guess I'll have to stop thinking of 
myself as a failed mathematician :-)

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/b2a913fb-b429-4653-b8a2-e20d5e04a98d%40googlegroups.com.

Reply via email to