The design of a pep8 fixer

Edward K. Ream Fri, 26 Feb 2010 14:07:01 -0800

I am fairly confident that I can write a suite of scripts that will
fix all major problems discussed in pep8 in about a week.


I say this because imo a pep8 fixer can be much simpler than either
pylint or 2to3.  Indeed, there are three possible ways to write code
munger:

1. Use ast trees.  This is the way followed by pylint and 2to3.  The
advantage of this approach is that the ast (parse) trees make
discovering the detailed structure of a program very easy.  The
disadvantage is that character-level information is (very) difficult
to obtain.

2. Use tokens.  It's harder to obtain structural (parser) information,
but easier to obtain character data.  indent.py uses this way, iirc.
So does Leo's pretty printer.

3. Use strings for (almost) everything.  This is the way I shall
adopt.  There is no doubt in my mind that this is the easiest way.  It
may seem counter-intuitive: scripts must do extra work to discover the
structure of the program being munged.  The great advantage of this
way is the scripts always have the actual strings of the text to work
on.  Thus, replacing one string by another is trivial.  And it is this
string replacement that is the essential operation.

As it turns out, I have lots of experience with this strategy.  It
underlies all of Leo's importers.  It's simple, it works, and I am
comfortable with it.  Also, Leo's importers already provide methods
that will discover the range of text covered by a class or def, which
is really all that needs to be done.

The pep8 fixer will consist of a series of simple, self-contained
scripts.  Each script will apply one particular fix to a string,
possibly using global context.

Each script will have the same basic organization.  It will be a
character-by-character **scanner** that understands Python strings,
comments and (for some scanners) classes and defs (functions or
methods).  And maybe also Leo's doc parts.

Writing such scanners is second nature to me.  Conceptually they are
very simple simple.

There are two kinds of pep8 fixers: local fixers and global fixers.
Local fixers require no context.  Local fixers will clean blank lines,
replace tabs by spaces, and split long lines into shorter lines.

Global fixers work (conceptually) on a list of files.  For example,
the fixer that changes a class name from xxxYyy to XxxYyy should work
on all the files of a project.  For example, it should work on all
files in Leo's core.  That allows the fixer to pre-scan for conflicts
before making any changes.

Global fixers will likely have two passes.  The first pass will
construct a global symbol table.  The fixer will likely abort if the
fix might map distinct input symbols into the same output symbol.
Assuming there are no collisions, the second pass will substitute the
approved spelling for the dubious spelling.

I shall definitely write many unit tests first as a way of addressing
various design issues.  I know from experience (and from my initial
ruminations) that designing the unit tests will uncover design
questions in the easiest way possible.

Packaging will be interesting.  Obviously, I want to run the fixer as
a Leo script, but I shall also want to package it for use by those who
do not use Leo.  This suggest that each fixer will convert a string to
a string.  Wrapper functions will allow the primary fixers to work in
various contexts.

That's about it.  I have studied 2to3 and pylint in enough detail to
be quite confident that my approach is fundamentally simpler than
using ast's or streams of tokens.

It would also be reasonable to convert 2to3 into a pep8 fixer.  After
all, both rewrite code.  But the mechanics behind 2to3 are
horrendously complex.  I want to base my code on something dirt
simple: a generic python scanner.  It's the way I think.  More
importantly, many fixers are fundamentally involved with characters.
Trying to "abstract" characters away actually makes things harder.

Edward

P.S. Speed is completely irrelevant here.  Or rather, the only thing
that matters is how fast I can write and debug these scripts :-)

P.P.S. For reasons that are not completely clear to me, I have always
loved writing this kind of code.  I'm pumped about this project.  I'm
going to make a bit of a race out of this. The goal: to complete this
project using TDD in less than a week.

EKR

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/leo-editor?hl=en.

The design of a pep8 fixer

Reply via email to