ENB: About python importers

Edward K. Ream Fri, 10 Dec 2021 04:03:00 -0800

This Engineering Notebook post will discuss the difficulties that *any* 
python importer must face. To state my conclusions first:


1. Generating the proper whitespace before @others correctly in *all* cases 
requires:

A: Some form of look-ahead, or equivalently, delayed code generation.
B: What amounts to a full *parse* of def and class lines.

2. I am willing to let the importer assume 4-space indentation for @others 
in class nodes. In effect, this is what the legacy Py_Importer class does!

*Background*

Vitalije's new importer has trouble importing 
mypy/test-data/stdlib-samples/3.2/test/test_textwrap.py. The file *is* imported 
perfectly, but many nodes are over-indented due to missing indentation in 
`@others` directives in the class nodes. 

The relevant code in the mknode function is:

o = indent('@others\n', ind-l_ind)
...    
p.b = f'{b1}{o}{b2}'

Alas, the value ind-l_ind won't work in all cases!  Instead, I suggest 
using the value 4 for all classes :-)  That's exactly what the legacy 
importer does!

Yes, this would break the strangely-indented unit tests, but I'm willing to 
live with that.

*The heroic alternative*

Generating the correct indentation for @others in *all* cases is much more 
difficult. Indeed, the indentation of the @others line must be the 
indentation of the *first significant line *following the class or def 
line. The first significant line is the first line that is not:

- A blank or a comment.
- In a string.

The legacy Py_Importer class detects such lines fairly easily.  It is the 
first non-blank, non-comment line for which Python_ScanState.in_context 
returns False:

def in_context(self):
    """True if in a special context."""
    return (
        self.context or
        self.curlies > 0 or  # Open curly brackets
        self.parens > 0 or  # Open parentheses.
        self.squares > 0 or  # Open square brackets
        self.bs_nl  # In backslash/newline.
    )

Ironically, having gone through all this trouble, my legacy importer *still* 
assumes 
4-space indentation! In theory, the importer *could* get the indentation 
right. In practice, it's dashed difficult to do so! 

The split_root functions (or its helpers) would *also *have to find the 
first significant line of a class! In effect, the new importer would have 
to do a full parse of the entire class or def line.

*Summary*

The python importer contains analogs of all the phases of an optimizing 
compiler. The incoming code must be tokenized and maybe even parsed. Code 
generation will never be easy.

In class or def nodes, the leading whitespace of @others directive should 
be the leading whitespace of the first significant line of the class or 
def. Finding the first significant line of a class or def requires a full 
parse.

Importers can avoid the parse phase only if they assume 4-space 
indentation! I am willing to make this concession, and I am willing to 
abandon (parts of) the unit tests for strangely-indented code.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/fd370010-19d7-4530-9346-b93566e72d9cn%40googlegroups.com.

ENB: About python importers

Reply via email to