Status report and a big Aha

Edward K. Ream Thu, 05 Dec 2019 03:16:56 -0800

The git logs will show that I have been working night and day for the past 
month on the fstrings branch.

Yesterday I thought I had completed the next phase of the work. All files
but one processed without complaint, which is significant because very
strong checks are always present.

However, the one failure involved the most complicated code in the project.
After several hours of work in the wee hours this morning I went back to
bed. Lying in bed I had a momentous Aha which will eliminate *all* the hard
parts of the code! Let me explain.

*Background*

The only truly difficult task is determining how many tokens correspond to
ast.JoinedStr nodes. This parse tree nodes are quite a mishmash. They
represent at least one f-string *and* all other *concatenated *strings,
whether f-strings or plain strings.

The scheme that I have spent so much time attempts to determine, by looking
at the JoinedStr node, which tokens correspond to the JoinedStr. This
involves an extremely messy process that I call *reconciliation*, which
munges the tree data to put it into *exact* correspondence with the next
'string' tokens. The following difficult methods are involved:
advance_str, adjust_str_token, get_string_parts, scan_fstring, scan_string
and the most difficult of all, get_joined_tokens.

All of this is about to go away!

*The Aha*

We can determine which 'string' tokens are concatenated just by looking at
the token list!!!

Indeed, 'string' tokens are concatenated if and *only* if there are no
significant tokens (including parens) between them!

So *none* of the old correspondence/reconciliation machinery is needed. We
can *ignore* the component ast nodes of the JoinedStr nodes completely and
just use the token data.

*Figures of merit*

The code is already very fast. For example:

leoGlobals.py
len(sources): 286901
setup time: 0.61 sec.
link time: 0.44 sec.

The setup time is the time to tokenize the file and compile it to a parse
tree. This involves two calls to python's standard library, so it is as
fast as possible.

The link time is the time to execute *all* the code in the
TokenOrderGenerator class! It is already way faster than other tools. It
will get a tad faster.

Moreover, the TOG is both substantially simpler and more flexible than
other tools. The Aha means that it will be very easy to debug and maintain.

Finally, the TOG makes no significant demands on the GC. There are *no*
large data structures involved, aside from the token list and the parse
tree. The *only* variable-length data is a token stack. This will typically
only have a few hundred entries. Python's run-time stack will have only a
few entries, because generators eliminate all significant recursion.

*Summary*

Today's Aha is a big deal. *All* of the difficult parts of the code are
about to disappear! The TOG will be easy to understand and maintain. It
can now be adapted easily to handle other kinds of parse trees, such as
pgen2/lib2to3.

The TOG class is fast, simple, general and flexible. It promises to be an
important tool in the python world. I'm proud of it.

The last month's work is as close as I have ever come to working on a
significant mathematical theorem. I guess I'll have to stop thinking of
myself as a failed mathematician :-)

Edward

--
You received this message because you are subscribed to the Google Groups
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/leo-editor/b2a913fb-b429-4653-b8a2-e20d5e04a98d%40googlegroups.com.

Status report and a big Aha

Reply via email to