This link may be of interest.  It is about reconstructing a python file 
from its parse tree.  Maybe a few changes to the code generator would do 
the job: 

Reconstruct Python 
<https://lark-parser.readthedocs.io/en/latest/examples/advanced/reconstruct_python.html>

On Friday, December 10, 2021 at 9:29:46 AM UTC-5 [email protected] wrote:

> As I understand it, the Python tokenizer keeps two stacks of indents.  In 
> one, each tab is expanded to the full 8 spaces.  In the other, a tab counts 
> for one space.  Both stacks have to agree on the indentation level at every 
> stage.
>
> When I have done the same job in the past - except I didn't need to 
> tokenize or parse everything the way an importer has to - to determine the 
> indentation level - I counted the number of tabs and spaces without regard 
> to order.  That gives an unambiguous indent level without needing to depend 
> on invisible details of the permutations and expansions of tabs and 
> spaces.  It worked well.
>
> Then on output of course the tabs could be replaced with four spaces.  No 
> problem there.  I dislike assuming tabs are always four spaces in the 
> input.  It would be easy for someone to set their editor to emit, say, 
> three spaces per tab  to get slightly more compact lines.  We don't know 
> how often that would happen.  And there could still be a few legacy files 
> around that use all tabs.  I have found them from time to time.
>
> On Friday, December 10, 2021 at 7:02:54 AM UTC-5 Edward K. Ream wrote:
>
>> This Engineering Notebook post will discuss the difficulties that *any* 
>> python importer must face. To state my conclusions first:
>>
>> 1. Generating the proper whitespace before @others correctly in *all* 
>> cases requires:
>>
>> A: Some form of look-ahead, or equivalently, delayed code generation.
>> B: What amounts to a full *parse* of def and class lines.
>>
>> 2. I am willing to let the importer assume 4-space indentation for 
>> @others in class nodes. In effect, this is what the legacy Py_Importer 
>> class does!
>>
>> *Background*
>>
>> Vitalije's new importer has trouble importing 
>> mypy/test-data/stdlib-samples/3.2/test/test_textwrap.py. The file *is* 
>> imported 
>> perfectly, but many nodes are over-indented due to missing indentation in 
>> `@others` directives in the class nodes. 
>>
>> The relevant code in the mknode function is:
>>
>> o = indent('@others\n', ind-l_ind)
>> ...    
>> p.b = f'{b1}{o}{b2}'
>>
>> Alas, the value ind-l_ind won't work in all cases!  Instead, I suggest 
>> using the value 4 for all classes :-)  That's exactly what the legacy 
>> importer does!
>>
>> Yes, this would break the strangely-indented unit tests, but I'm willing 
>> to live with that.
>>
>> *The heroic alternative*
>>
>> Generating the correct indentation for @others in *all* cases is much 
>> more difficult. Indeed, the indentation of the @others line must be the 
>> indentation of the *first significant line *following the class or def 
>> line. The first significant line is the first line that is not:
>>
>> - A blank or a comment.
>> - In a string.
>>
>> The legacy Py_Importer class detects such lines fairly easily.  It is the 
>> first non-blank, non-comment line for which Python_ScanState.in_context 
>> returns False:
>>
>> def in_context(self):
>>     """True if in a special context."""
>>     return (
>>         self.context or
>>         self.curlies > 0 or  # Open curly brackets
>>         self.parens > 0 or  # Open parentheses.
>>         self.squares > 0 or  # Open square brackets
>>         self.bs_nl  # In backslash/newline.
>>     )
>>
>> Ironically, having gone through all this trouble, my legacy importer 
>> *still* assumes 4-space indentation! In theory, the importer *could* get 
>> the indentation right. In practice, it's dashed difficult to do so! 
>>
>> The split_root functions (or its helpers) would *also *have to find the 
>> first significant line of a class! In effect, the new importer would have 
>> to do a full parse of the entire class or def line.
>>
>> *Summary*
>>
>> The python importer contains analogs of all the phases of an optimizing 
>> compiler. The incoming code must be tokenized and maybe even parsed. 
>> Code generation will never be easy.
>>
>> In class or def nodes, the leading whitespace of @others directive should 
>> be the leading whitespace of the first significant line of the class or 
>> def. Finding the first significant line of a class or def requires a full 
>> parse.
>>
>> Importers can avoid the parse phase only if they assume 4-space 
>> indentation! I am willing to make this concession, and I am willing to 
>> abandon (parts of) the unit tests for strangely-indented code.
>>
>> Edward
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/10ede53f-5594-4a7d-97f9-b7d851de27d7n%40googlegroups.com.

Reply via email to