As I understand it, the Python tokenizer keeps two stacks of indents.  In 
one, each tab is expanded to the full 8 spaces.  In the other, a tab counts 
for one space.  Both stacks have to agree on the indentation level at every 
stage.

When I have done the same job in the past - except I didn't need to 
tokenize or parse everything the way an importer has to - to determine the 
indentation level - I counted the number of tabs and spaces without regard 
to order.  That gives an unambiguous indent level without needing to depend 
on invisible details of the permutations and expansions of tabs and 
spaces.  It worked well.

Then on output of course the tabs could be replaced with four spaces.  No 
problem there.  I dislike assuming tabs are always four spaces in the 
input.  It would be easy for someone to set their editor to emit, say, 
three spaces per tab  to get slightly more compact lines.  We don't know 
how often that would happen.  And there could still be a few legacy files 
around that use all tabs.  I have found them from time to time.

On Friday, December 10, 2021 at 7:02:54 AM UTC-5 Edward K. Ream wrote:

> This Engineering Notebook post will discuss the difficulties that *any* 
> python importer must face. To state my conclusions first:
>
> 1. Generating the proper whitespace before @others correctly in *all* 
> cases requires:
>
> A: Some form of look-ahead, or equivalently, delayed code generation.
> B: What amounts to a full *parse* of def and class lines.
>
> 2. I am willing to let the importer assume 4-space indentation for @others 
> in class nodes. In effect, this is what the legacy Py_Importer class does!
>
> *Background*
>
> Vitalije's new importer has trouble importing 
> mypy/test-data/stdlib-samples/3.2/test/test_textwrap.py. The file *is* 
> imported 
> perfectly, but many nodes are over-indented due to missing indentation in 
> `@others` directives in the class nodes. 
>
> The relevant code in the mknode function is:
>
> o = indent('@others\n', ind-l_ind)
> ...    
> p.b = f'{b1}{o}{b2}'
>
> Alas, the value ind-l_ind won't work in all cases!  Instead, I suggest 
> using the value 4 for all classes :-)  That's exactly what the legacy 
> importer does!
>
> Yes, this would break the strangely-indented unit tests, but I'm willing 
> to live with that.
>
> *The heroic alternative*
>
> Generating the correct indentation for @others in *all* cases is much 
> more difficult. Indeed, the indentation of the @others line must be the 
> indentation of the *first significant line *following the class or def 
> line. The first significant line is the first line that is not:
>
> - A blank or a comment.
> - In a string.
>
> The legacy Py_Importer class detects such lines fairly easily.  It is the 
> first non-blank, non-comment line for which Python_ScanState.in_context 
> returns False:
>
> def in_context(self):
>     """True if in a special context."""
>     return (
>         self.context or
>         self.curlies > 0 or  # Open curly brackets
>         self.parens > 0 or  # Open parentheses.
>         self.squares > 0 or  # Open square brackets
>         self.bs_nl  # In backslash/newline.
>     )
>
> Ironically, having gone through all this trouble, my legacy importer 
> *still* assumes 4-space indentation! In theory, the importer *could* get 
> the indentation right. In practice, it's dashed difficult to do so! 
>
> The split_root functions (or its helpers) would *also *have to find the 
> first significant line of a class! In effect, the new importer would have 
> to do a full parse of the entire class or def line.
>
> *Summary*
>
> The python importer contains analogs of all the phases of an optimizing 
> compiler. The incoming code must be tokenized and maybe even parsed. Code 
> generation will never be easy.
>
> In class or def nodes, the leading whitespace of @others directive should 
> be the leading whitespace of the first significant line of the class or 
> def. Finding the first significant line of a class or def requires a full 
> parse.
>
> Importers can avoid the parse phase only if they assume 4-space 
> indentation! I am willing to make this concession, and I am willing to 
> abandon (parts of) the unit tests for strangely-indented code.
>
> Edward
>

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/517e3fd4-24ad-4b91-a676-c256b881b8f7n%40googlegroups.com.

Reply via email to