Re: ENB: What's next for check-conventions

Edward K. Ream Tue, 12 Dec 2017 09:58:25 -0800

On Tuesday, December 12, 2017 at 8:06:15 AM UTC-6, vitalije wrote:

> I was unpleasantly surprised with the complexity of code, and yet you 
wrote: "Otoh, the infrastructure is simple and good."


The infrastructure is similar to, but simpler than, tools such as pyflakes, 
pylint and my previous tools.

I know of no way to avoid traversing ast trees in this kind of problem.

> I believe that it looks simple in your mind while it is still fresh. But 
in mind of a programmer who reads it for the first time it is very hard to 
understand. I guess you would feel the same when you return to this code 
after several months from now.

I doubt it. Compiler-like tools have their own complexity metrics. And 
their own design/coding patterns.

> I was expecting to see something like a function that takes a filename or 
its content as a string and a list of conventions (possible some flags 
too), and returns a generator of all violations in a form (line number, 
line contents, ...).

That would be a most unusual pattern. I've never seen anything like it in 
all the tools I have studied.

There are two tasks facing any inference engine such as pyflakes, pylint or 
check-conventions:

1. Create data structures describing the program being analyzed.
2. Use those data to make inferences.

To understand the inherent complexities, please study the pyflakes and 
pylint code, as I have done. The contrast between the pylint and pyflakes 
is stark:

- Pyflakes is arguably the most elegant code I have ever studied.  I am 
happy to call it a work of genius.  In essence, it manages to get the 
benefits of a two-pass algorithm in a single traversal of the ast. It also 
catches way more practical bugs than pylint.

- In contrast, pylint is unimaginably complicated.  My one claim to fame re 
pylint is that I fixed a bug in pylint 
<https://bugs.launchpad.net/pylint/+bug/456870> that occurred several 
hundred levels down in the call stack.  I used a special purpose debugger 
(a subclass of pdb) to do this.  Despite this brag, I still don't 
understand pylint in detail.

In fact, the check-convention code simulates an ast traversal.  The do_* 
methods correspond directly to actual ast visitors.  So yes, I insist, the 
organization of check-convention is, by the standards of similar tools, 
ordinary and straightforward.

> Instead I have found 7+ classes. 

The ConventionChecker is the only class you need to understand. It contains 
two simple "internal" classes, Type and CCStats.  Oh yes, the code does use 
the simple ProjectUtils class.

Please ignore the other classes.  

For now, ConventionChecker needs to be in leoCheck.py so that other code 
can use imp.reload.  Eventually, I suppose, it should migrate into 
checkerCommands.py.

> Now, when someone new to this code reads it, it is not clear where to 
find what is in self.patterns. 

These patterns are defined just before the do_* methods, where they are 
actually used.

You can find them with cff: patterns.append

> Wouldn't be much simpler to have dispatch definition contains pairs of 
(regex pattern, method) or if kind is really necessary for something else 
then dispatch can have a form {pattern: (method, kind)}.

No.

The infrastructure is, really and truly, straightforward. The 
infrastructure merely ensures that the do_* visitors get called as 
expected. 

> Additionally, we could have a helper function.

I'm not going to comment on this.  The present code works well enough.

The only thing that matters are the resolve_* methods.  They aren't easy, 
and can never can be.  But they are a lot easier than the corresponding 
code in pylint.

> The content of file being checked is parsed into ast node, and then that 
ast node is formatted again to string to be analyzed further.

Yes.  This is a brilliant simplification, if I do say so myself.

> It involves call to leoAst.AstFormatter which uses visitor pattern. It is 
far from clear what is happening. 

Brilliance sometimes is a bit difficult to understand.  

> It took me quite a while to figure that all this code have just a simple 
purpose to clean a bit source code. 

It's way more than that:

1. Each statement now appears on it's own line, simplifying the regex's.
2. Comments have disappeared.
3. Most importantly, using strings makes it possible to apply regex's to 
each statement.  This is crucial for the prototype!

> Wouldn't be much easier to read and understand if those two phases were 
gathered inside one function named `clean_source`: 

No.

> inside start_class method there is also a guard `if self.pass_n==1` which 
means it should also be called only in the first pass.

I've cleaned up this code.  The old start_class method is gone.

> Is class Pass1 used at all?

check-conventions uses only the ConventionChecker and ProjectUtils classes.

> And [Pass1] is very complex class to understand.

It was a previous failed attempt. It never got around to making any type 
analysis at all.

I've left Pass1 in leoCheck.py because something like it would be needed if 
check-conventions ever needed to be faster or more accurate.  At present, 
that does not seem likely.

> Perhaps visitor pattern is easy to write, but to read and understand it 
could be very challenging. 

There is no substitute for learning this pattern. Python's ast module 
exists for a reason.

> Considering that code is usually written once and read many times, I 
would use it only when there is no other alternative.

The current prototype is the alternative.  But the prototype *can not* 
handle nested classes properly.  So, depending on my energy level and 
ambition, the prototype may have to morph into a tree traversal.

But *even then* there would be a strong incentive to use regex patterns to 
drive the code.  Indeed, I can't think of any way to translate @data 
setting into code.

> Almost always same effect can be achieved with a dictionary explicitly 
written that has names as keys and functions as values. [snip] But if there 
is only a few values that we are interested in, then I would argue that 
visitor pattern is overkill. YMMV

I'm confident that I understand the issues. What matters is the following:

1. simplifying/generalizing resolve and its helpers.
2. (maybe) calling resolve from more places.

> ProjectUtils class whose sole purpose is to find bunch of files? I really 
think it is overkill. 

Imo, this class is good encapsulation.

I have just rewritten pu.project_files so that it has a chance of working 
on machines other than mine ;-)

> I assume the whole purpose of this class is just to give data for testing 
purposes. Am I right?

Yes.

> Above all, I would suggest for any prototyping purposes to avoid classes 
as much as possible.

I could not possibly agree less with this statement.

In fact, I have had lots of "advance planning" due to my previous mostly 
failed experiments.

> when we start with some classes it is very hard to get rid of them 
without breaking other parts of our code. That causes us to keep bad design 
decisions and cover them with more and more special cases which is receipt 
for disaster or at least for unmaintainable and unreadable code.

My previous bad/fatal decisions all related to worrying about details 
instead of focusing on inference.

> If instead we are prototyping with just pure functions it is very easy to 
delete any of them without breaking other parts of code. 

I have never seen compiler-like code that used pure functions.  It's an 
interesting idea, but I have no idea how it would be done.

> If there was a bug discovered somewhere in leoCheck.py, lets say after 
six months from now, my guess is that it would be fixed by introducing new 
checks and guards against some special corner cases. 

The present urgent task is to simplify the resolve_* methods. They act on 
already available data (the simple symbol tables), so the fact that these 
are methods instead of functions looks irrelevant. 

> I would be greatly surprised if fixing such bug would result in reduced 
complexity.

I admit that resolve_ivar is a mess. Any suggestions for simplifying it 
would be most welcome.

> My intention is not to be offensive.

No offense taken. It is my strong opinion that packaging issues are not 
very important, and have been solved satisfactorily for now.

The present prototype is, in fact, a *radical *simplification of previous 
code. As a result, I have been able, for the first time ever, to make 
useful deductions.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/leo-editor.
For more options, visit https://groups.google.com/d/optout.

Re: ENB: What's next for check-conventions

Reply via email to