Re: The design of a pep8 fixer

Edward K. Ream Tue, 02 Mar 2010 06:49:30 -0800

On Feb 26, 4:06 pm, "Edward K. Ream" <[email protected]> wrote:
> I am fairly confident that I can write a suite of scripts that will
> fix all major problems discussed in pep8 in about a week.
>
> I say this because imo a pep8 fixer can be much simpler than either
> pylint or 2to3.


Time for a status report.  Ref 23 demonstrates a significant
milestone, and reveals a significant problem.  Happily, there is an
excellent solution, one that might have important implications for
pylint.

The milestone: Leo now munges class, method and function names
correctly, even if the class names cross file boundaries.  To do this,
Leo now uses two symbol tables.  Routine details omitted.

The significant problem: the pep8 fixer might want to munge ivars and
module-level vars and constants.  To do this, the code must know the
text range of classes, functions and methods.  Happily, this can be
done in a very fast prepass. Here is the rough design:

1. Modify the PythonScanner so it will support a pass zero (pass_n in
(0,1,2)).  In pass zero, scan_or_fix_word will call a fixer method,
that will start the scan if the id being scanned is 'class' or 'def'.
Starting a scan for class or def pushes an the starting indentation on
a pass0_stack.

When the scanner sees a newline, it will call another fixer method
that will see if the present line ends one or more classes, methods or
functions.  It's easy to do this: just compare the indentation of the
present line to the indentations on the pass0_stack.

2. The result of the pass 0 scan will be a groupRangeDict.  Keys are
the text indices of the start of a class, method or function.  Values
are the text indices of the end of the item.

3. During the last pass of name processing, we can determine the range
of any name by looking at a pass2_stack.  Pass 2 puts entries on the
stack when seeing the start of a class, method or function.  Pass 2
need not actually go to the bother of popping items at the end of an
item.  Instead, a pass 2 function, say id_kind, will compare the
present text index, say i, with the items on the pass2_stack, popping
items off the stack as needed until it finds a range that covers i.

This scheme is excellent in all respects.  It simple, robust, self-
contained and extremely fast.

The implications:  pylint does a huge amount of manipulation
(traversing) of parse trees (ast's).  I am beginning to wonder whether
this "elegant" approach is really so good after all.  Indeed, here is
the essence of the situation as far as pylint goes.

1. The fundamental pylint problem is to make as many deep inferences
about names as possible.  In this sense, a name is a sequence w.x.y.z
and the inferences that apply depend on the chain of inferences from w
to x to y to z.  Parsing is not the issue!  The only things that
matter are the symbol tables that can be constructed.

2. The fundamental fact about python is that it has very simple
context.  There are modules (files), class, methods and functions.
That's *all*.  We have already seen that it is easy (one could say
trivial) to discover that range of all these contexts.

So the question is, what, exactly is gained by using parse trees?  I
confess that as far as the truly interesting problems in pylint go, I
can see *no* advantage to using ast's!

But aren't there plenty of drawbacks to the endless focus on ast
trees?  I think there are.  As I have implied above, interesting
inferences are *semantic* properties, usually about identifiers.  This
suggests that a better approach to pylint simply this:

    ** Convert syntax to semantics-oriented data strurctures as early
as possible**

In other words, the most effective approach will be to create rich
symbol tables in early passes, and then exploit those tables to the
fullest in later passes.  In this master scheme, **it doesn't matter**
how we parse code: we are free to use the simplest thing that could
possibly work.  The PythonParser class in pep8.py is that way.

Edward

P.S. Before somebody jumps all over me, there may be places in the
pylint code where using the ast's could be considered the elegant
way.  Two cases come immediately to mind:  import statements and the
determination of unreachable code.  For import statements, the
complexities reside mostly in the semantics, but using theast might
reduce errors.  Another example: pylint finds unreachable code
elegantly.  But imo it would be foolish to let these small tails wag
the dog.

EKR

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/leo-editor?hl=en.

Re: The design of a pep8 fixer

Reply via email to