As I understand it, the Python tokenizer keeps two stacks of indents. In
one, each tab is expanded to the full 8 spaces. In the other, a tab counts
for one space. Both stacks have to agree on the indentation level at every
stage.
When I have done the same job in the past - except I didn't need to
tokenize or parse everything the way an importer has to - to determine the
indentation level - I counted the number of tabs and spaces without regard
to order. That gives an unambiguous indent level without needing to depend
on invisible details of the permutations and expansions of tabs and
spaces. It worked well.
Then on output of course the tabs could be replaced with four spaces. No
problem there. I dislike assuming tabs are always four spaces in the
input. It would be easy for someone to set their editor to emit, say,
three spaces per tab to get slightly more compact lines. We don't know
how often that would happen. And there could still be a few legacy files
around that use all tabs. I have found them from time to time.
On Friday, December 10, 2021 at 7:02:54 AM UTC-5 Edward K. Ream wrote:
> This Engineering Notebook post will discuss the difficulties that *any*
> python importer must face. To state my conclusions first:
>
> 1. Generating the proper whitespace before @others correctly in *all*
> cases requires:
>
> A: Some form of look-ahead, or equivalently, delayed code generation.
> B: What amounts to a full *parse* of def and class lines.
>
> 2. I am willing to let the importer assume 4-space indentation for @others
> in class nodes. In effect, this is what the legacy Py_Importer class does!
>
> *Background*
>
> Vitalije's new importer has trouble importing
> mypy/test-data/stdlib-samples/3.2/test/test_textwrap.py. The file *is*
> imported
> perfectly, but many nodes are over-indented due to missing indentation in
> `@others` directives in the class nodes.
>
> The relevant code in the mknode function is:
>
> o = indent('@others\n', ind-l_ind)
> ...
> p.b = f'{b1}{o}{b2}'
>
> Alas, the value ind-l_ind won't work in all cases! Instead, I suggest
> using the value 4 for all classes :-) That's exactly what the legacy
> importer does!
>
> Yes, this would break the strangely-indented unit tests, but I'm willing
> to live with that.
>
> *The heroic alternative*
>
> Generating the correct indentation for @others in *all* cases is much
> more difficult. Indeed, the indentation of the @others line must be the
> indentation of the *first significant line *following the class or def
> line. The first significant line is the first line that is not:
>
> - A blank or a comment.
> - In a string.
>
> The legacy Py_Importer class detects such lines fairly easily. It is the
> first non-blank, non-comment line for which Python_ScanState.in_context
> returns False:
>
> def in_context(self):
> """True if in a special context."""
> return (
> self.context or
> self.curlies > 0 or # Open curly brackets
> self.parens > 0 or # Open parentheses.
> self.squares > 0 or # Open square brackets
> self.bs_nl # In backslash/newline.
> )
>
> Ironically, having gone through all this trouble, my legacy importer
> *still* assumes 4-space indentation! In theory, the importer *could* get
> the indentation right. In practice, it's dashed difficult to do so!
>
> The split_root functions (or its helpers) would *also *have to find the
> first significant line of a class! In effect, the new importer would have
> to do a full parse of the entire class or def line.
>
> *Summary*
>
> The python importer contains analogs of all the phases of an optimizing
> compiler. The incoming code must be tokenized and maybe even parsed. Code
> generation will never be easy.
>
> In class or def nodes, the leading whitespace of @others directive should
> be the leading whitespace of the first significant line of the class or
> def. Finding the first significant line of a class or def requires a full
> parse.
>
> Importers can avoid the parse phase only if they assume 4-space
> indentation! I am willing to make this concession, and I am willing to
> abandon (parts of) the unit tests for strangely-indented code.
>
> Edward
>
--
You received this message because you are subscribed to the Google Groups
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/leo-editor/517e3fd4-24ad-4b91-a676-c256b881b8f7n%40googlegroups.com.