On Wednesday, May 20, 2015 at 10:35:11 AM UTC-5, Edward K. Ream wrote: This thread continues the thread: The problem with "improving" code, and a solution.
> I am going to cut my losses and declare that the port of PythonTidy.py to Python 3 has failed. I have changed my mind. Leo needs fast beautification code that works with both Python 2 and 3. *Executive Summary* A simple port of PythonTidy to Python 3 seems out of the question. autopep8 is too slow to be Leo's default beautifier. It is about 10x slower than PythonTidy. A major project beckons. I am prepared to spend at least several weeks on it. The goal is to create a beautifier that is 4x faster than PythonTidy. Imo, this goal is feasible. The rest of this post are notes in an Engineering Notebook. Feel free to ignore. All comments welcome. *Overview of the problem* Beautification is important and *seemingly *straightforward. In fact, it is difficult and complex. I have been fascinated with various related projects in the past, probably because I enjoy collapsing complexity. This problem could be solved using either *tokens *or *ast's *(parse trees). I've explored both approaches. Each has a fundamental limitation: - Tokens know nothing about the structure of text (parse trees) - Parse trees know (almost) nothing about tokens (white space, comments and the "spelling" of strings). The ast-based approach seems more natural. It *would *be more more natural if tokens could be associated with ast statement nodes. This problem is crucial. I'll investigate it first, in part by studying pep8.py and autopep8.py. PythonTidy contains complex code to merge tokens into the final output. This complexity bleeds into the rest of the code in unwelcome ways. Imo, the token-based approach is still worth serious consideration. I threw up my hands recently not because of "parsing" issues, but because the output-forming logic exploded in complexity. *Tools* I have created several tools that apply to this problem: 1. The string composition functions in leoGlobals.py-->...-->g.List composition. When I awoke this morning I remembered these functions and had the feeling that they (or something like them) might form the foundation of "smart concatenation" of output strings. This could simplify either token-based or ast-based code. 2. Token-based pretty-printing code. I am going to take the token-based code out of the attic to see if its output logic can be simplified. 3. PythonTidy and the AstFormatter class in leoAst.py. I spent yesterday using a variant (not a subclass) of the AstFormatter class as basis for a tree-based approach to beautification. In the process, I found many bugs in the AstFormatter class. Corrected code will be pushed soon. The plan is to fold some PythonTidy code into the AstFormatter code: - The PythonTidy parsing code looks like it is tied to Python 2. In any event, it's too ugly to tolerate. As a result, parsing will be based on the AstFormatter class. This implies a major revision of the PythonTidy code. - The PythonTidy output code is extremely complex, but it works. I'll investigate simplifying the code with the g.List composition code... *Summary* Leo's existing beautification code using PythonTidy is fast and works well, but only for Python 2. Both ast-based and token-based approaches are worth a look. The g.List composition code might simplify either approach. Porting PythonTidy to Python 3 would, in fact, be a major rewrite. This rewrite promises perhaps a 4x increase in speed. First, I'll look for a simple way to associate tokens with ast statement nodes. Edward -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/leo-editor. For more options, visit https://groups.google.com/d/optout.
