On Tuesday, December 12, 2017 at 8:06:15 AM UTC-6, vitalije wrote: > I was unpleasantly surprised with the complexity of code, and yet you wrote: "Otoh, the infrastructure is simple and good."
The infrastructure is similar to, but simpler than, tools such as pyflakes, pylint and my previous tools. I know of no way to avoid traversing ast trees in this kind of problem. > I believe that it looks simple in your mind while it is still fresh. But in mind of a programmer who reads it for the first time it is very hard to understand. I guess you would feel the same when you return to this code after several months from now. I doubt it. Compiler-like tools have their own complexity metrics. And their own design/coding patterns. > I was expecting to see something like a function that takes a filename or its content as a string and a list of conventions (possible some flags too), and returns a generator of all violations in a form (line number, line contents, ...). That would be a most unusual pattern. I've never seen anything like it in all the tools I have studied. There are two tasks facing any inference engine such as pyflakes, pylint or check-conventions: 1. Create data structures describing the program being analyzed. 2. Use those data to make inferences. To understand the inherent complexities, please study the pyflakes and pylint code, as I have done. The contrast between the pylint and pyflakes is stark: - Pyflakes is arguably the most elegant code I have ever studied. I am happy to call it a work of genius. In essence, it manages to get the benefits of a two-pass algorithm in a single traversal of the ast. It also catches way more practical bugs than pylint. - In contrast, pylint is unimaginably complicated. My one claim to fame re pylint is that I fixed a bug in pylint <https://bugs.launchpad.net/pylint/+bug/456870> that occurred several hundred levels down in the call stack. I used a special purpose debugger (a subclass of pdb) to do this. Despite this brag, I still don't understand pylint in detail. In fact, the check-convention code simulates an ast traversal. The do_* methods correspond directly to actual ast visitors. So yes, I insist, the organization of check-convention is, by the standards of similar tools, ordinary and straightforward. > Instead I have found 7+ classes. The ConventionChecker is the only class you need to understand. It contains two simple "internal" classes, Type and CCStats. Oh yes, the code does use the simple ProjectUtils class. Please ignore the other classes. For now, ConventionChecker needs to be in leoCheck.py so that other code can use imp.reload. Eventually, I suppose, it should migrate into checkerCommands.py. > Now, when someone new to this code reads it, it is not clear where to find what is in self.patterns. These patterns are defined just before the do_* methods, where they are actually used. You can find them with cff: patterns.append > Wouldn't be much simpler to have dispatch definition contains pairs of (regex pattern, method) or if kind is really necessary for something else then dispatch can have a form {pattern: (method, kind)}. No. The infrastructure is, really and truly, straightforward. The infrastructure merely ensures that the do_* visitors get called as expected. > Additionally, we could have a helper function. I'm not going to comment on this. The present code works well enough. The only thing that matters are the resolve_* methods. They aren't easy, and can never can be. But they are a lot easier than the corresponding code in pylint. > The content of file being checked is parsed into ast node, and then that ast node is formatted again to string to be analyzed further. Yes. This is a brilliant simplification, if I do say so myself. > It involves call to leoAst.AstFormatter which uses visitor pattern. It is far from clear what is happening. Brilliance sometimes is a bit difficult to understand. > It took me quite a while to figure that all this code have just a simple purpose to clean a bit source code. It's way more than that: 1. Each statement now appears on it's own line, simplifying the regex's. 2. Comments have disappeared. 3. Most importantly, using strings makes it possible to apply regex's to each statement. This is crucial for the prototype! > Wouldn't be much easier to read and understand if those two phases were gathered inside one function named `clean_source`: No. > inside start_class method there is also a guard `if self.pass_n==1` which means it should also be called only in the first pass. I've cleaned up this code. The old start_class method is gone. > Is class Pass1 used at all? check-conventions uses only the ConventionChecker and ProjectUtils classes. > And [Pass1] is very complex class to understand. It was a previous failed attempt. It never got around to making any type analysis at all. I've left Pass1 in leoCheck.py because something like it would be needed if check-conventions ever needed to be faster or more accurate. At present, that does not seem likely. > Perhaps visitor pattern is easy to write, but to read and understand it could be very challenging. There is no substitute for learning this pattern. Python's ast module exists for a reason. > Considering that code is usually written once and read many times, I would use it only when there is no other alternative. The current prototype is the alternative. But the prototype *can not* handle nested classes properly. So, depending on my energy level and ambition, the prototype may have to morph into a tree traversal. But *even then* there would be a strong incentive to use regex patterns to drive the code. Indeed, I can't think of any way to translate @data setting into code. > Almost always same effect can be achieved with a dictionary explicitly written that has names as keys and functions as values. [snip] But if there is only a few values that we are interested in, then I would argue that visitor pattern is overkill. YMMV I'm confident that I understand the issues. What matters is the following: 1. simplifying/generalizing resolve and its helpers. 2. (maybe) calling resolve from more places. > ProjectUtils class whose sole purpose is to find bunch of files? I really think it is overkill. Imo, this class is good encapsulation. I have just rewritten pu.project_files so that it has a chance of working on machines other than mine ;-) > I assume the whole purpose of this class is just to give data for testing purposes. Am I right? Yes. > Above all, I would suggest for any prototyping purposes to avoid classes as much as possible. I could not possibly agree less with this statement. In fact, I have had lots of "advance planning" due to my previous mostly failed experiments. > when we start with some classes it is very hard to get rid of them without breaking other parts of our code. That causes us to keep bad design decisions and cover them with more and more special cases which is receipt for disaster or at least for unmaintainable and unreadable code. My previous bad/fatal decisions all related to worrying about details instead of focusing on inference. > If instead we are prototyping with just pure functions it is very easy to delete any of them without breaking other parts of code. I have never seen compiler-like code that used pure functions. It's an interesting idea, but I have no idea how it would be done. > If there was a bug discovered somewhere in leoCheck.py, lets say after six months from now, my guess is that it would be fixed by introducing new checks and guards against some special corner cases. The present urgent task is to simplify the resolve_* methods. They act on already available data (the simple symbol tables), so the fact that these are methods instead of functions looks irrelevant. > I would be greatly surprised if fixing such bug would result in reduced complexity. I admit that resolve_ivar is a mess. Any suggestions for simplifying it would be most welcome. > My intention is not to be offensive. No offense taken. It is my strong opinion that packaging issues are not very important, and have been solved satisfactorily for now. The present prototype is, in fact, a *radical *simplification of previous code. As a result, I have been able, for the first time ever, to make useful deductions. Edward -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/leo-editor. For more options, visit https://groups.google.com/d/optout.
