Re: The design of a pep8 fixer

Edward K. Ream Sun, 28 Feb 2010 10:03:11 -0800

On Feb 27, 2:10 pm, "Edward K. Ream" <[email protected]> wrote:


> it is now clear that scanning is, by far, the simplest and most flexible way 
> to
> munge code.

Now it's time to examine in more details what the munges are :-) This
will be the to-do list for the rest of the project.

1. Tabs.  Discussed in the "Code Layout" section of pep 8. This is has
great practical significance for Leo's importers.  There are two
possible (command-line) options.

A. Tab width.  4 will be the default.

B. Hard vs. soft.  Soft tabs (the default) replace tabs with enough
spaces to get to the next tab stop.  Hard tabs replace tabs by n tabs,
where n is the tab width.  Probably all tabs in the leading whitespace
(lws) of a line should be soft.  There is more doubt about what to do
with tab characters in strings.

Outside of lws, comments and strings, the question is moot, because
all whitespace will be converted to zero or one blanks as recommended
by pep8.  There will be a separate pass for what I call "op-spacing".

2. Op-spacing.  This is the white space section of pep-8.  Perhaps tab
munging will be done in this pass.  On re-reading this part of pep 8,
I am surprised how closely my taste matches Guido's. There are lots of
recommendation in the pep, and perhaps each should get its own option,
but initially the fixer will follow the pep closely.

BTW, Leo's pretty-print commands do this, but they use tokens rather
than strings.  I shall probably sack the token-oriented code: it's
difficult to understand because looking ahead and behind is *much*
harder in the token world.

Also, the "two-blank lines before classes" recommendation does not
necessarily make sense in Leo, for obvious reasons.

3. Long lines. Discussed in the "Code Layout" section of pep 8.  The
recommendations are excellent.  However, the harder case is how to
deal with long *comment* lines.  There are several special
considerations for Leo:

A. We must be sure never to split sentinel lines, no matter how long
they are ;-)

B. Sequences of comments lines should be use a paragraph-filling
algorithm.  That is, it would be ugly to create lines that look like:

    # full line -------------------------------
    # short continuation line
    # full line -------------------------------
    # another short line

We want instead:

    # full line ------------------------------
    # full line ------------------------------
    # last line.

C. We might want to replace a sequence of lines by an @doc part.

D. For long comments following code I prefer to replace:

    code # comment

by:

    code
        # comment

or

    code
        # comment 1
        # comment 2

4. Identifier munging. The pep allows some leeway for existing code
bases.  Conceivably, all of Leo's code base could be "grandfathered".
However, I do want Leo to use the recommended style.

As I read the pep, the lower_case_with_underscores style is
recommended for all identifiers except class names and imported
names.  This makes semantic processing easy:

A.  We *never* change names appearing in any part of an import
statement.

B. We use the CapWords for all class names, that is, all names
following "class" (outside of comments and strings, of course).

C. We use the lower_case_with_underscores for *everything* else.

The present code does all this easily already.  A first pass fills in
a global symbol table, a second pass does the munging.  I haven't
written the actual munging code.  The only complication is that munge
must not happen if it conflicts with a symbol already existing in the
symbol table.

That's about it.  Each of the items 1 through 4 above are relatively
straightforward.  They will all get done eventually, in an order to be
determined...

Edward

P.S. Practically speaking, packaging issues will be significant,
including command-line args.  However, the task will be
straightforward to the simple and flexible nature of the top-level
fix() method.  This method is the perfect boundary between internal
code and packaging code.  It could hardly be simpler conceptually or
practically.

EKR

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/leo-editor?hl=en.

Re: The design of a pep8 fixer

Reply via email to