ENB: About the python importer

Edward K. Ream Tue, 23 Nov 2021 03:52:24 -0800

According to PR #2331 <https://github.com/leo-editor/leo-editor/pull/2331>, 
I started work on the new python importer 9 days ago.  This Engineering 
Notebook post will discuss what I have done and the remaining difficulties.



*vnode_info dictionary*

All importers now use a *vnode_info *dict instead of injecting the 
*_import_lines 
*ivar into vnodes.  Keys are vnodes; values are* inner dictionaries.*

The inner dictionary contains at least one key/value pair:

    "lines": <list of lines for the vnode>.

VNodes use slots <https://docs.python.org/3/reference/datamodel.html#slots>, 
so the vnode_info dict* slightly* reduces the descriptor memory required in 
all vnodes. More importantly, the vnode_info dict allows the python 
importer to contain other key/value pairs.

*Stackless python importer*

Previously, all importers, including the python importer, used a stack that 
mirrored the structure of the imported nodes that the importers created.  
Keeping the stack in sync with created nodes is tricky. Aha! Maybe the 
stack isn't needed! The vnode_info dict may suffice.  The python importer 
uses an inner dict with these keys:

{
     '@others': <True: lines contains @others>,
     'indent': <The node's indentation, see below>,
     'kind': <one of 'outer', 'org', 'class', 'def'>,
     'lines': < list of lines for the vnode>,
}

Instead of getting these values from the stack, the importer will get these 
values from the generated nodes.  For example, in the main importer loop 
the *p var *points at the node being generated. So info_dict [p.parent().v] 
contains the data for p's parent and  info_dict [p.back().v] contains the 
data for p's previous sibling, if any.

I *think* this new organization will work, but there are no guarantees. If 
necessary, I'll revert to the old stack-based architecture, with all of its 
complexities.


*The python importer is inherently complex*

Aha! The python importer is intrinsically at least as complex as the 
javascript importer, and perhaps more so! This complexity has been quite a 
shock!

How can this be? Doesn't python impose strict standards for indentation and 
structure?

*Strangely indented lines*

Alas, the answer is "yes and no." :-)  *Most* of the time python classes, 
methods, and functions follow a simple format.  But not always!  For 
example, the following is a valid python program! Try it! 

if 1:
 print('indent 1')
if 2:
  print('indent 2')
if 3:
   print('indent 3')
if 4:
    print('indent 4')
if 5:
     print('indent 5')

Who would do such a thing, you ask?  Well, mypy unit tests, for one. Those 
unit tests contain other strange (valid!) constructions.

Furthermore, one could replace the "print" statements above with "class" or 
"def" statements, and one could imagine similar strange "if" statements 
*within* the range of a class definition!

*Important*: strangely-indented lines can only happen within the range of 
compound statements such as "if", "for", "while", and "with", etc.  But 
"class" and "def" statements are also compound statements in this sense!  
It's quite a mess. 

*Keeping track of indentation*

In short, the python importer can not assume *anything* about what 
indentation may be in effect in the range of a class definition!

As noted above, the python importer assigns a *vnode kind* for each 
generated vnode. The valid (string) values are outer, org, class, and def. 
Hmm., As I write this, perhaps the importer should use "method" and 
"function" kinds instead of the generic "def" kind.

The "org" kind should allow the python importer to handle 
strangely-indented lines. Indeed, python does not allow *complete* chaos! 
For example, the following is a syntax error:

class Class1:
    def method1():  # 4-space indentation
        pass  # 8-space indentation.
      def method2():  # 6-space indentation.
          pass

Python gives this error:

    def method2():  # 6-space indentation.
                                          ^
IndentationError: unindent does not match any outer indentation level
That is, the first statement in the range of the class determines the *allowed 
indentation* for all other statements of the class, including compound 
statements.  Presumably, the 'indent' value for "class" nodes will be the 
allowed indentation, but perhaps the vnode_info dict should contain *two* 
indent-related keys.  See below.

*Underindented lines*

A further complication involves so-called underindented lines, that is, 
lines that Leo can not represent properly using the natural node 
structure.  Leo uses an ugly *escape convention* to represent such lines.  
Most Leonistas probably have never seen the escape convention, but Leo does 
support it.

At present, the python importer's perfect-import check allows leading 
whitespace to be added to otherwise underindented *comment *lines (only). 
Imo, adding this extra whitespace is preferable to using the underindented 
convention, but I might change my mind.

*Removing common leading whitespace*

*Importer.undent* removes leading whitespace from generated nodes.  
i.undent calculates the* greatest* leading whitespace in the entire node 
and removes this whitespace from *all* lines of the nodes, inserting the 
underindented escape sequence as necessary!

The python importer will likely override i.undent (*python_i.undent*) so as 
to never insert the underindented escape sequence. Perhaps textwrap.dedent 
*can* be used, but that assumes that all strangely-indented nodes are under 
the range of an `@others` directive that is indented by exactly the amount 
that textwrap.dedent will (eventually) remove!

So there are a lot of constraints involved in generating nodes!

*Aha! The post pass can use the vnode_info dict*

As I write this, I see that the vnode_info dict has another advantage over 
the stack-based architecture. The vnode_info dict is available to (the 
possibly overridden) undent method. Perhaps the vnode_info dict might have 
two indentation-related keys. We shall see.

*Summary*

Surprisingly, the python importer is inherently the most complex importer 
of all.

Organizer nodes will allow the importer to handle even the most bizarre 
strange-indented nodes.  However, generating the necessary organizer nodes 
has stumped me for several days. The task is far from easy.

The base Importer class defines the architecture of all importers. There is 
no need to improve this architecture! In particular, the line-by-line 
nature of the gen_lines method ensures that all importers, including the 
python importer, will be close to as fast as possible. There is no need to 
worry about the speed of the python importer!

To sum up: the task is to ensure the perfect import of *all valid python 
programs*, regardless of indentation quirks.

Edward

P.S. As I write this I see that the underindented escape convention seems 
not to be documented.  Searching for "underindentEscapeString" in leoPy.leo 
will show the relevant code.

EKR

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/9fa57bf6-d283-4462-8506-6a23f48e731dn%40googlegroups.com.

ENB: About the python importer

Reply via email to