A design for a better pylint

Edward K. Ream Mon, 15 Feb 2010 07:59:51 -0800

As I said in the post, "The new me: active vs passive" the following
design is active preparation for studying the pylint code.


The problem I face is that the pylint code doesn't actually seem to
make inferences about the code that it is analyzing.  It all seems
like preparation :-)

I am writing this after briefly studying the astng code in
site-packages\logilab_astng-0.19.3-py2.6.egg\logilab\astng
and the pylint checker code in pylint-0.19.0\checkers

--- Goals

The primary goal, indeed the only goal, for the "new pylint" is to
make it the best possible platform for analysis of Python code.
Pylint, pychecker and similar tools are arguably the most important
tools in the Python world other than Python itself.

By "best possible" I mean the following:

1. It should be as fast as possible.
2. It should be as simple and easy-to-understand as possible.
3. It should be as flexible and extensible as possible.

It should be easy for other people to write their own checkers.  That
means that the code must be well documented and packaged for easy use.

--- The grand strategy

1. One of the biggest surprise of my programming life was the
realization that multi-pass compiler algorithms can be significantly
*faster* than single-pass algorithms.  At the time, that was a huge
shock.  Now, it seems second nature to me.  So the first part of the
strategy is that the new pylint will consist of multiple passes,
largely independent of each other.  This will have salutary effects
everywhere.

2. In contrast to pylint, the new pylint will treat ast trees as read
only.  Early passes will convert the ast trees of the modules being
analyzed into self-contained data structures.  This seems like
avoidable overhead, but it should greatly simplify later passes.
Prepasses that simplify later passes pay huge dividends.

3. At present, pylint analyzes only one file at a time.  The new
pylint will handle lists of files.  This will save a lots of work.
The new pylint will preprocess each file only once.

--- The passes

The new pylint will process files in approximately this way:

Pass 1: read all file-level ast's.

For each file in the list of files passed to the (new) pylint, pylint
will read the ast for that file.  It takes one line of code: tree =
ast.parse(s,filename=fn,mode='exec')

This pass creates file2astDict, a dictionary associating filenames
with ast's.

Pass 2: import all files.

This pass traverses all the ast's in file2astDict, looking for import
statements.  If the imported file does not already exist in
file2astDict.keys(), this pass will read the ast for that file and
create an entry in file2astDict.

Pass 3: ast to internal data

For every ast in file2astDict.keys(), this pass will create internal
data structures that
a) contain all the info in the ast but
b) is optimized for the task of data analysis.

This pass will create **pylint nodes** corresponding to ast nodes.  I
don't want to monkey-patch the ast tree: I want a new tree that does
*exactly* what I want.

Pass 4: post-process pylint nodes

Details hazy at present, but this pass would perform preliminary
analysis common to all inferences.  It might massage the data in
pylint nodes to make the data easier to use later.

Pass 5: dynamic analysis

This is the fun/hard part.  We want to infer the types that variables
can have, and the types that functions (and classes with __call__
members) can have.  Analyzing assignment statements is the big one.

Pass 6: apply checkers

When all the preliminary work is complete, checking the code in
various ways should be simply a matter of checking data in the pylint
trees.

As a background requirement, I see no reason for any of these passes,
including the checkers, ever to raise an exception.  The present
pylint code is almost impossible to understand because one is never
sure what exceptions are being tossed around. Exceptions have global
effect.  They are not a great idea when trying to simplify code. In
contrast, a multi-pass scheme does clearly defined tasks on clearly-
defined data.  In such an environment, there should be no surprises.

--- Good artists copy--great artists steal

My strength as a programmer is repackaging and revision.  The new
design is, in effect, a massive refactoring of existing code. There is
plenty of useful code in pylint.  I plan to steal as much of it as
possible :-)

--- Where do I go from here?

I started this design because I want to see how I would approach the
problem.  This design will help me study the existing pylint code.

The question is, do I really intend to do an alternate pylint?  I'm
not sure yet, but I'm not ruling it out.  I have been looking for a
project that showcases Leo's strengths.  A new pylint would do that.
More importantly, pylint can and should be improved, and at the very
least a new pylint might spur further work on the existing pylint.
The competition could be interesting :-)

Edward

P.S.  A crucial part of being active is picking projects that are
small enough to keep my energy high.  The last three months have been
a high-energy time for me because I've been focused on small projects:
fixing one bug at a time.  The design presented above breaks a very
large tasks into more manageable pieces.

Better, it might be possible to fold the new design, step by step,
into the *existing* pylint.  If possible, that would be a win for
everyone.  Unless I completely misunderstand pylint, reading ast trees
once for each file in a *list* of files would greatly speed up the
existing pylint.  So that's something to look into.

EKR

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/leo-editor?hl=en.

A design for a better pylint

Reply via email to