As I said in the post, "The new me: active vs passive" the following design is active preparation for studying the pylint code.
The problem I face is that the pylint code doesn't actually seem to make inferences about the code that it is analyzing. It all seems like preparation :-) I am writing this after briefly studying the astng code in site-packages\logilab_astng-0.19.3-py2.6.egg\logilab\astng and the pylint checker code in pylint-0.19.0\checkers --- Goals The primary goal, indeed the only goal, for the "new pylint" is to make it the best possible platform for analysis of Python code. Pylint, pychecker and similar tools are arguably the most important tools in the Python world other than Python itself. By "best possible" I mean the following: 1. It should be as fast as possible. 2. It should be as simple and easy-to-understand as possible. 3. It should be as flexible and extensible as possible. It should be easy for other people to write their own checkers. That means that the code must be well documented and packaged for easy use. --- The grand strategy 1. One of the biggest surprise of my programming life was the realization that multi-pass compiler algorithms can be significantly *faster* than single-pass algorithms. At the time, that was a huge shock. Now, it seems second nature to me. So the first part of the strategy is that the new pylint will consist of multiple passes, largely independent of each other. This will have salutary effects everywhere. 2. In contrast to pylint, the new pylint will treat ast trees as read only. Early passes will convert the ast trees of the modules being analyzed into self-contained data structures. This seems like avoidable overhead, but it should greatly simplify later passes. Prepasses that simplify later passes pay huge dividends. 3. At present, pylint analyzes only one file at a time. The new pylint will handle lists of files. This will save a lots of work. The new pylint will preprocess each file only once. --- The passes The new pylint will process files in approximately this way: Pass 1: read all file-level ast's. For each file in the list of files passed to the (new) pylint, pylint will read the ast for that file. It takes one line of code: tree = ast.parse(s,filename=fn,mode='exec') This pass creates file2astDict, a dictionary associating filenames with ast's. Pass 2: import all files. This pass traverses all the ast's in file2astDict, looking for import statements. If the imported file does not already exist in file2astDict.keys(), this pass will read the ast for that file and create an entry in file2astDict. Pass 3: ast to internal data For every ast in file2astDict.keys(), this pass will create internal data structures that a) contain all the info in the ast but b) is optimized for the task of data analysis. This pass will create **pylint nodes** corresponding to ast nodes. I don't want to monkey-patch the ast tree: I want a new tree that does *exactly* what I want. Pass 4: post-process pylint nodes Details hazy at present, but this pass would perform preliminary analysis common to all inferences. It might massage the data in pylint nodes to make the data easier to use later. Pass 5: dynamic analysis This is the fun/hard part. We want to infer the types that variables can have, and the types that functions (and classes with __call__ members) can have. Analyzing assignment statements is the big one. Pass 6: apply checkers When all the preliminary work is complete, checking the code in various ways should be simply a matter of checking data in the pylint trees. As a background requirement, I see no reason for any of these passes, including the checkers, ever to raise an exception. The present pylint code is almost impossible to understand because one is never sure what exceptions are being tossed around. Exceptions have global effect. They are not a great idea when trying to simplify code. In contrast, a multi-pass scheme does clearly defined tasks on clearly- defined data. In such an environment, there should be no surprises. --- Good artists copy--great artists steal My strength as a programmer is repackaging and revision. The new design is, in effect, a massive refactoring of existing code. There is plenty of useful code in pylint. I plan to steal as much of it as possible :-) --- Where do I go from here? I started this design because I want to see how I would approach the problem. This design will help me study the existing pylint code. The question is, do I really intend to do an alternate pylint? I'm not sure yet, but I'm not ruling it out. I have been looking for a project that showcases Leo's strengths. A new pylint would do that. More importantly, pylint can and should be improved, and at the very least a new pylint might spur further work on the existing pylint. The competition could be interesting :-) Edward P.S. A crucial part of being active is picking projects that are small enough to keep my energy high. The last three months have been a high-energy time for me because I've been focused on small projects: fixing one bug at a time. The design presented above breaks a very large tasks into more manageable pieces. Better, it might be possible to fold the new design, step by step, into the *existing* pylint. If possible, that would be a win for everyone. Unless I completely misunderstand pylint, reading ast trees once for each file in a *list* of files would greatly speed up the existing pylint. So that's something to look into. EKR -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/leo-editor?hl=en.
