On Wednesday, May 20, 2015 at 10:35:11 AM UTC-5, Edward K. Ream wrote:

This thread continues the thread: The problem with "improving" code, and a 
solution.

> I am going to cut my losses and declare that the port of PythonTidy.py to 
Python 3 has failed.

I have changed my mind. Leo needs fast beautification code that works with 
both Python 2 and 3.

*Executive Summary*

A simple port of PythonTidy to Python 3 seems out of the question.

autopep8 is too slow to be Leo's default beautifier. It is about 10x slower 
than PythonTidy.

A major project beckons. I am prepared to spend at least several weeks on 
it.

The goal is to create a beautifier that is 4x faster than PythonTidy.  Imo, 
this goal is feasible.

The rest of this post are notes in an Engineering Notebook.  Feel free to 
ignore. All comments welcome.

*Overview of the problem*

Beautification is important and *seemingly *straightforward.  In fact, it 
is difficult and complex.

I have been fascinated with various related projects in the past, probably 
because I enjoy collapsing complexity.

This problem could be solved using either *tokens *or *ast's *(parse 
trees). I've explored both approaches. Each has a fundamental limitation:

- Tokens know nothing about the structure of text (parse trees)

- Parse trees know (almost) nothing about tokens (white space, comments and 
the "spelling" of strings).

The ast-based approach seems more natural. It *would *be more more natural 
if tokens could be associated with ast statement nodes. This problem is 
crucial. I'll investigate it first, in part by studying pep8.py and 
autopep8.py.

PythonTidy contains complex code to merge tokens into the final output.  
This complexity bleeds into the rest of the code in unwelcome ways.

Imo, the token-based approach is still worth serious consideration.  I 
threw up my hands recently not because of "parsing" issues, but because the 
output-forming logic exploded in complexity.

*Tools*

I have created several tools that apply to this problem:

1. The string composition functions in leoGlobals.py-->...-->g.List 
composition.

When I awoke this morning I remembered these functions and had the feeling 
that they (or something like them) might form the foundation of "smart 
concatenation" of output strings.  This could simplify either token-based 
or ast-based code.

2. Token-based pretty-printing code.

I am going to take the token-based code out of the attic to see if its 
output logic can be simplified.

3. PythonTidy and the AstFormatter class in leoAst.py.

I spent yesterday using a variant (not a subclass) of the AstFormatter 
class as basis for a tree-based approach to beautification.  In the 
process, I found many bugs in the AstFormatter class.  Corrected code will 
be pushed soon.

The plan is to fold some PythonTidy code into the AstFormatter code:

- The PythonTidy parsing code looks like it is tied to Python 2.  In any 
event, it's too ugly to tolerate. As a result, parsing will be based on the 
AstFormatter class.  This implies a major revision of the PythonTidy code.

- The PythonTidy output code is extremely complex, but it works.  I'll 
investigate simplifying the code with the g.List composition code...

*Summary*

Leo's existing beautification code using PythonTidy is fast and works well, 
but only for Python 2.

Both ast-based and token-based approaches are worth a look.  The g.List 
composition code might simplify either approach.

Porting PythonTidy to Python 3 would, in fact, be a major rewrite.  This 
rewrite promises perhaps a 4x increase in speed.

First, I'll look for a simple way to associate tokens with ast statement 
nodes.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/leo-editor.
For more options, visit https://groups.google.com/d/optout.

Reply via email to