Re: ENB: About python importers

[email protected] Fri, 10 Dec 2021 06:45:54 -0800

Where is this test file  
mypy/test-data/stdlib-samples/3.2/test/test_textwrap.py? I don't see it in 
the devel or import branch, and I don't see it in the mypy package either.


On Friday, December 10, 2021 at 9:36:22 AM UTC-5 [email protected] wrote:

> This link may be of interest.  It is about reconstructing a python file 
> from its parse tree.  Maybe a few changes to the code generator would do 
> the job: 
>
> Reconstruct Python 
> <https://lark-parser.readthedocs.io/en/latest/examples/advanced/reconstruct_python.html>
>
> On Friday, December 10, 2021 at 9:29:46 AM UTC-5 [email protected] wrote:
>
>> As I understand it, the Python tokenizer keeps two stacks of indents.  In 
>> one, each tab is expanded to the full 8 spaces.  In the other, a tab counts 
>> for one space.  Both stacks have to agree on the indentation level at every 
>> stage.
>>
>> When I have done the same job in the past - except I didn't need to 
>> tokenize or parse everything the way an importer has to - to determine the 
>> indentation level - I counted the number of tabs and spaces without regard 
>> to order.  That gives an unambiguous indent level without needing to depend 
>> on invisible details of the permutations and expansions of tabs and 
>> spaces.  It worked well.
>>
>> Then on output of course the tabs could be replaced with four spaces.  No 
>> problem there.  I dislike assuming tabs are always four spaces in the 
>> input.  It would be easy for someone to set their editor to emit, say, 
>> three spaces per tab  to get slightly more compact lines.  We don't know 
>> how often that would happen.  And there could still be a few legacy files 
>> around that use all tabs.  I have found them from time to time.
>>
>> On Friday, December 10, 2021 at 7:02:54 AM UTC-5 Edward K. Ream wrote:
>>
>>> This Engineering Notebook post will discuss the difficulties that *any* 
>>> python importer must face. To state my conclusions first:
>>>
>>> 1. Generating the proper whitespace before @others correctly in *all* 
>>> cases requires:
>>>
>>> A: Some form of look-ahead, or equivalently, delayed code generation.
>>> B: What amounts to a full *parse* of def and class lines.
>>>
>>> 2. I am willing to let the importer assume 4-space indentation for 
>>> @others in class nodes. In effect, this is what the legacy Py_Importer 
>>> class does!
>>>
>>> *Background*
>>>
>>> Vitalije's new importer has trouble importing 
>>> mypy/test-data/stdlib-samples/3.2/test/test_textwrap.py. The file *is* 
>>> imported 
>>> perfectly, but many nodes are over-indented due to missing indentation in 
>>> `@others` directives in the class nodes. 
>>>
>>> The relevant code in the mknode function is:
>>>
>>> o = indent('@others\n', ind-l_ind)
>>> ...    
>>> p.b = f'{b1}{o}{b2}'
>>>
>>> Alas, the value ind-l_ind won't work in all cases!  Instead, I suggest 
>>> using the value 4 for all classes :-)  That's exactly what the legacy 
>>> importer does!
>>>
>>> Yes, this would break the strangely-indented unit tests, but I'm willing 
>>> to live with that.
>>>
>>> *The heroic alternative*
>>>
>>> Generating the correct indentation for @others in *all* cases is much 
>>> more difficult. Indeed, the indentation of the @others line must be the 
>>> indentation of the *first significant line *following the class or def 
>>> line. The first significant line is the first line that is not:
>>>
>>> - A blank or a comment.
>>> - In a string.
>>>
>>> The legacy Py_Importer class detects such lines fairly easily.  It is 
>>> the first non-blank, non-comment line for which Python_ScanState.in_context 
>>> returns False:
>>>
>>> def in_context(self):
>>>     """True if in a special context."""
>>>     return (
>>>         self.context or
>>>         self.curlies > 0 or  # Open curly brackets
>>>         self.parens > 0 or  # Open parentheses.
>>>         self.squares > 0 or  # Open square brackets
>>>         self.bs_nl  # In backslash/newline.
>>>     )
>>>
>>> Ironically, having gone through all this trouble, my legacy importer 
>>> *still* assumes 4-space indentation! In theory, the importer *could* get 
>>> the indentation right. In practice, it's dashed difficult to do so! 
>>>
>>> The split_root functions (or its helpers) would *also *have to find the 
>>> first significant line of a class! In effect, the new importer would have 
>>> to do a full parse of the entire class or def line.
>>>
>>> *Summary*
>>>
>>> The python importer contains analogs of all the phases of an optimizing 
>>> compiler. The incoming code must be tokenized and maybe even parsed. 
>>> Code generation will never be easy.
>>>
>>> In class or def nodes, the leading whitespace of @others directive 
>>> should be the leading whitespace of the first significant line of the class 
>>> or def. Finding the first significant line of a class or def requires a 
>>> full parse.
>>>
>>> Importers can avoid the parse phase only if they assume 4-space 
>>> indentation! I am willing to make this concession, and I am willing to 
>>> abandon (parts of) the unit tests for strangely-indented code.
>>>
>>> Edward
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/27c45ad5-96ab-4477-a46c-10bdcab7de33n%40googlegroups.com.

Re: ENB: About python importers

Reply via email to