Where is this test file mypy/test-data/stdlib-samples/3.2/test/test_textwrap.py? I don't see it in the devel or import branch, and I don't see it in the mypy package either.
On Friday, December 10, 2021 at 9:36:22 AM UTC-5 [email protected] wrote: > This link may be of interest. It is about reconstructing a python file > from its parse tree. Maybe a few changes to the code generator would do > the job: > > Reconstruct Python > <https://lark-parser.readthedocs.io/en/latest/examples/advanced/reconstruct_python.html> > > On Friday, December 10, 2021 at 9:29:46 AM UTC-5 [email protected] wrote: > >> As I understand it, the Python tokenizer keeps two stacks of indents. In >> one, each tab is expanded to the full 8 spaces. In the other, a tab counts >> for one space. Both stacks have to agree on the indentation level at every >> stage. >> >> When I have done the same job in the past - except I didn't need to >> tokenize or parse everything the way an importer has to - to determine the >> indentation level - I counted the number of tabs and spaces without regard >> to order. That gives an unambiguous indent level without needing to depend >> on invisible details of the permutations and expansions of tabs and >> spaces. It worked well. >> >> Then on output of course the tabs could be replaced with four spaces. No >> problem there. I dislike assuming tabs are always four spaces in the >> input. It would be easy for someone to set their editor to emit, say, >> three spaces per tab to get slightly more compact lines. We don't know >> how often that would happen. And there could still be a few legacy files >> around that use all tabs. I have found them from time to time. >> >> On Friday, December 10, 2021 at 7:02:54 AM UTC-5 Edward K. Ream wrote: >> >>> This Engineering Notebook post will discuss the difficulties that *any* >>> python importer must face. To state my conclusions first: >>> >>> 1. Generating the proper whitespace before @others correctly in *all* >>> cases requires: >>> >>> A: Some form of look-ahead, or equivalently, delayed code generation. >>> B: What amounts to a full *parse* of def and class lines. >>> >>> 2. I am willing to let the importer assume 4-space indentation for >>> @others in class nodes. In effect, this is what the legacy Py_Importer >>> class does! >>> >>> *Background* >>> >>> Vitalije's new importer has trouble importing >>> mypy/test-data/stdlib-samples/3.2/test/test_textwrap.py. The file *is* >>> imported >>> perfectly, but many nodes are over-indented due to missing indentation in >>> `@others` directives in the class nodes. >>> >>> The relevant code in the mknode function is: >>> >>> o = indent('@others\n', ind-l_ind) >>> ... >>> p.b = f'{b1}{o}{b2}' >>> >>> Alas, the value ind-l_ind won't work in all cases! Instead, I suggest >>> using the value 4 for all classes :-) That's exactly what the legacy >>> importer does! >>> >>> Yes, this would break the strangely-indented unit tests, but I'm willing >>> to live with that. >>> >>> *The heroic alternative* >>> >>> Generating the correct indentation for @others in *all* cases is much >>> more difficult. Indeed, the indentation of the @others line must be the >>> indentation of the *first significant line *following the class or def >>> line. The first significant line is the first line that is not: >>> >>> - A blank or a comment. >>> - In a string. >>> >>> The legacy Py_Importer class detects such lines fairly easily. It is >>> the first non-blank, non-comment line for which Python_ScanState.in_context >>> returns False: >>> >>> def in_context(self): >>> """True if in a special context.""" >>> return ( >>> self.context or >>> self.curlies > 0 or # Open curly brackets >>> self.parens > 0 or # Open parentheses. >>> self.squares > 0 or # Open square brackets >>> self.bs_nl # In backslash/newline. >>> ) >>> >>> Ironically, having gone through all this trouble, my legacy importer >>> *still* assumes 4-space indentation! In theory, the importer *could* get >>> the indentation right. In practice, it's dashed difficult to do so! >>> >>> The split_root functions (or its helpers) would *also *have to find the >>> first significant line of a class! In effect, the new importer would have >>> to do a full parse of the entire class or def line. >>> >>> *Summary* >>> >>> The python importer contains analogs of all the phases of an optimizing >>> compiler. The incoming code must be tokenized and maybe even parsed. >>> Code generation will never be easy. >>> >>> In class or def nodes, the leading whitespace of @others directive >>> should be the leading whitespace of the first significant line of the class >>> or def. Finding the first significant line of a class or def requires a >>> full parse. >>> >>> Importers can avoid the parse phase only if they assume 4-space >>> indentation! I am willing to make this concession, and I am willing to >>> abandon (parts of) the unit tests for strangely-indented code. >>> >>> Edward >>> >> -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/leo-editor/27c45ad5-96ab-4477-a46c-10bdcab7de33n%40googlegroups.com.
