According to PR #2331 <https://github.com/leo-editor/leo-editor/pull/2331>,
I started work on the new python importer 9 days ago. This Engineering
Notebook post will discuss what I have done and the remaining difficulties.
*vnode_info dictionary*
All importers now use a *vnode_info *dict instead of injecting the
*_import_lines
*ivar into vnodes. Keys are vnodes; values are* inner dictionaries.*
The inner dictionary contains at least one key/value pair:
"lines": <list of lines for the vnode>.
VNodes use slots <https://docs.python.org/3/reference/datamodel.html#slots>,
so the vnode_info dict* slightly* reduces the descriptor memory required in
all vnodes. More importantly, the vnode_info dict allows the python
importer to contain other key/value pairs.
*Stackless python importer*
Previously, all importers, including the python importer, used a stack that
mirrored the structure of the imported nodes that the importers created.
Keeping the stack in sync with created nodes is tricky. Aha! Maybe the
stack isn't needed! The vnode_info dict may suffice. The python importer
uses an inner dict with these keys:
{
'@others': <True: lines contains @others>,
'indent': <The node's indentation, see below>,
'kind': <one of 'outer', 'org', 'class', 'def'>,
'lines': < list of lines for the vnode>,
}
Instead of getting these values from the stack, the importer will get these
values from the generated nodes. For example, in the main importer loop
the *p var *points at the node being generated. So info_dict [p.parent().v]
contains the data for p's parent and info_dict [p.back().v] contains the
data for p's previous sibling, if any.
I *think* this new organization will work, but there are no guarantees. If
necessary, I'll revert to the old stack-based architecture, with all of its
complexities.
*The python importer is inherently complex*
Aha! The python importer is intrinsically at least as complex as the
javascript importer, and perhaps more so! This complexity has been quite a
shock!
How can this be? Doesn't python impose strict standards for indentation and
structure?
*Strangely indented lines*
Alas, the answer is "yes and no." :-) *Most* of the time python classes,
methods, and functions follow a simple format. But not always! For
example, the following is a valid python program! Try it!
if 1:
print('indent 1')
if 2:
print('indent 2')
if 3:
print('indent 3')
if 4:
print('indent 4')
if 5:
print('indent 5')
Who would do such a thing, you ask? Well, mypy unit tests, for one. Those
unit tests contain other strange (valid!) constructions.
Furthermore, one could replace the "print" statements above with "class" or
"def" statements, and one could imagine similar strange "if" statements
*within* the range of a class definition!
*Important*: strangely-indented lines can only happen within the range of
compound statements such as "if", "for", "while", and "with", etc. But
"class" and "def" statements are also compound statements in this sense!
It's quite a mess.
*Keeping track of indentation*
In short, the python importer can not assume *anything* about what
indentation may be in effect in the range of a class definition!
As noted above, the python importer assigns a *vnode kind* for each
generated vnode. The valid (string) values are outer, org, class, and def.
Hmm., As I write this, perhaps the importer should use "method" and
"function" kinds instead of the generic "def" kind.
The "org" kind should allow the python importer to handle
strangely-indented lines. Indeed, python does not allow *complete* chaos!
For example, the following is a syntax error:
class Class1:
def method1(): # 4-space indentation
pass # 8-space indentation.
def method2(): # 6-space indentation.
pass
Python gives this error:
def method2(): # 6-space indentation.
^
IndentationError: unindent does not match any outer indentation level
That is, the first statement in the range of the class determines the *allowed
indentation* for all other statements of the class, including compound
statements. Presumably, the 'indent' value for "class" nodes will be the
allowed indentation, but perhaps the vnode_info dict should contain *two*
indent-related keys. See below.
*Underindented lines*
A further complication involves so-called underindented lines, that is,
lines that Leo can not represent properly using the natural node
structure. Leo uses an ugly *escape convention* to represent such lines.
Most Leonistas probably have never seen the escape convention, but Leo does
support it.
At present, the python importer's perfect-import check allows leading
whitespace to be added to otherwise underindented *comment *lines (only).
Imo, adding this extra whitespace is preferable to using the underindented
convention, but I might change my mind.
*Removing common leading whitespace*
*Importer.undent* removes leading whitespace from generated nodes.
i.undent calculates the* greatest* leading whitespace in the entire node
and removes this whitespace from *all* lines of the nodes, inserting the
underindented escape sequence as necessary!
The python importer will likely override i.undent (*python_i.undent*) so as
to never insert the underindented escape sequence. Perhaps textwrap.dedent
*can* be used, but that assumes that all strangely-indented nodes are under
the range of an `@others` directive that is indented by exactly the amount
that textwrap.dedent will (eventually) remove!
So there are a lot of constraints involved in generating nodes!
*Aha! The post pass can use the vnode_info dict*
As I write this, I see that the vnode_info dict has another advantage over
the stack-based architecture. The vnode_info dict is available to (the
possibly overridden) undent method. Perhaps the vnode_info dict might have
two indentation-related keys. We shall see.
*Summary*
Surprisingly, the python importer is inherently the most complex importer
of all.
Organizer nodes will allow the importer to handle even the most bizarre
strange-indented nodes. However, generating the necessary organizer nodes
has stumped me for several days. The task is far from easy.
The base Importer class defines the architecture of all importers. There is
no need to improve this architecture! In particular, the line-by-line
nature of the gen_lines method ensures that all importers, including the
python importer, will be close to as fast as possible. There is no need to
worry about the speed of the python importer!
To sum up: the task is to ensure the perfect import of *all valid python
programs*, regardless of indentation quirks.
Edward
P.S. As I write this I see that the underindented escape convention seems
not to be documented. Searching for "underindentEscapeString" in leoPy.leo
will show the relevant code.
EKR
--
You received this message because you are subscribed to the Google Groups
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/leo-editor/9fa57bf6-d283-4462-8506-6a23f48e731dn%40googlegroups.com.