Re: ENB: What's next for check-conventions

vitalije Tue, 12 Dec 2017 06:06:32 -0800

>
>
> *Maintain code simplicity!*
>
> Without any wish to degrade your work on this project, I feel strong need 
to suggest some ideas. Feel free to ignore them if you find them 
unappealing.

This morning I checked what you have done so far and I was unpleasantly 
surprised with the complexity of code, and yet you wrote:

> Otoh, the infrastructure is simple and good. Speed is good enough. The new 
> Stats class is much more clever than before.

I believe that it looks simple in your mind while it is still fresh. But in 
mind of a programmer who reads it for the first time it is very hard to 
understand. I guess you would feel the same when you return to this code 
after several months from now.

First of all, before I looked in the code, I was expecting to see something 
like a function that takes a filename or its content as a string and a list 
of conventions (possible some flags too), and returns a generator of all 
violations in a form (line number, line contents, ...). Instead I have 
found 7+ classes. 

Here are some points I collected while exploring leoCheck.py:

   1. ConventionChecker defines class field patterns as empty list and then 
   later throughout the definition of class patterns are appended to this 
   field. Value of this field is used only once in method check_helper 
   def check_helper(self, fn, node, s):

   trace = False and self.enable_trace

   trace_source = False

   self.file_name = fn

   dispatch = {

   'call': self.do_call,

   'class': self.do_class,

   'def': self.do_def,

   'self.x=': self.do_assn_to_self,

   'c.x=': self.do_assn_to_c,

   }

   s1 = leoAst.AstFormatter().format(node)

   for n in (1, 2):

   self.enable_trace = n == 2

   self.class_name = None

   self.pass_n = n

   if trace:

   print('===== pass: %s' % n)

   if trace_source and n == 2:

   print('===== source')

   print(s1)

   print('----- end source')

   for n, s in enumerate(g.splitLines(s1)):

   self.line_number = n

   for kind, pattern in self.patterns:

   m = pattern.match(s)

   if m:

   f = dispatch.get(kind)

   f(kind, m, s)

   self.start_class()

   self.end_program()

   Now, when someone new to this code reads it, it is not clear where to 
   find what is in self.patterns. It is obvious that self.patterns contains 
   pairs (kind, pattern), but it is far from obvious where those pairs come 
   from. Wouldn't be much clearer to explicitly spell all those pairs in `for` 
   statement. First part of each pair `kind` has sole purpose to connect 
   through `dispatch` pattern with the corresponding method. That is another 
   unnecessary indirection that just adds to the complexity. Wouldn't be much 
   simpler to have dispatch definition contains pairs of (regex pattern, 
   method) or if kind is really necessary for something else then dispatch can 
   have a form {pattern: (method, kind)}. Even further it is somewhat 
   misleading to iterate over patterns for each line of source code, when in 
   fact it is possible only one pattern to match a single line. Wouldn't be 
   much more readable to have a sequence of `if elif elif elif elif....else` 
   and get rid of patterns, dispatch and `for` statement. Additionally, we 
   could have a helper function: 
   def check_pattern(line, pat, func):
   m = pat.match(line)
   if m:
   func(m, line)

   return True

   for n, s in enumerate(g.splitLines(s1)): self.line_number = n 
   check_pattern(s, assn_to_c_pattern, self.do_assn_to_c) or \ 
   check_pattern(s, assn_to_self_pattern, self.do_assn_to_self) or \ 
   check_pattern(s, call_pattern, self.do_call) or \ check_pattern(s, 
   class_pattern, self.do_class) or \ check_pattern(s, def_pattern, 
   self.do_def)
   Imagine this snippet inside check_helper method. Isn't it simpler? More 
   readable? Going a little bit further, methods do_... can be also defined 
   inside check_helper method. Three of them have guard `if self.pass_n == 1` 
   and pass_n is set inside check_helper. Were this three methods functions 
   defined inside check_helper, they would have direct access to pass_n 
   variable. do_class is in fact equivalent with start_class and the only 
   thing start_class do is to set self.class_name and a default dict for 
   self.classes[self.class_name] (it really would be better to use 
   self.classes=collections.defaultdict({'ivars':{}, 'methods':{}})
   class_name ivar is used in other do_... methods which means if 
   start_class was replaced with the simple assignment to a local variable 
   inside check_helper do_ functions could have direct access to it or it 
   could be passed as argument where needed. Given the fact that some of this 
   methods are used only in first pass and do_call is used in both passes it 
   would make sens to get rid of the outer loop also and just have two 
   different functions `do_pass_one` and `do_pass_two` which would call those 
   functions that are meant to run in given pass. 
   2. content of file being checked is parsed into ast node, and then that 
   ast node is formatted again to string to be analyzed further. It involves 
   call to leoAst.AstFormater which uses visitor pattern. It is far from clear 
   what is happening. It took me quite a while to figure that all this code 
   have just a simple purpose to clean a bit source code. Wouldn't be much 
   easier to read and understand if those two phases were gathered inside one 
   function named `clean_source`: 
   def clean_source(code):
   node = ast.parse(code, '...', 'exec')
   return leoAst.AstFormater().format(node)
   I still don't know what is cleaned by this code. If only comments then 
   perhaps better name would be clean_comments. 
   3. Actually inside start_class method there is also a guard `if 
   self.pass_n==1` which means it should also be called only in the first 
   pass. Furthermore, this function is called from two places from 
   check_helper at the end of single pass without argument and from do_class 
   method with regex match as argument. Inside of this method there is check 
   if the function has been called without argument or not. It would be much 
   easier to read and understand if this function is divided in two separate 
   functions. The net effect of calling this method without argument is 
   setting self.class_name to None, which means self.start_class() can be 
   replaced with `self.class_name = None`. Isn't it much readable? If called 
   with the argument, it increases self.stats.classes by 1. However it does it 
   in both passes which would suggest that self.stats.classes would be equal 
   to twice the number of class definitions at the end of second pass. (I 
   haven't tested this but it seems to me likely to be true).
   4. Is class Pass1 used at all? I couldn't find where? And it is very 
   complex class to understand. Perhaps visitor pattern is easy to write, but 
   to read and understand it could be very challenging. Considering that code 
   is usually written once and read many times, I would use it only when there 
   is no other alternative. Almost always same effect can be achieved with a 
   dictionary explicitly written that has names as keys and functions as 
   values. If methods called from visit method change some ivars on self, and 
   whole traversal is meant just to collect few ivar values that way, it is 
   much more readable if tree is explicitly searched for interesting values 
   and results yielded or collected, aggregated or whatever needs to be done. 
   I know that any code dealing with syntax trees is most likely written using 
   visitor pattern, and it may look like a suitable pattern to use. But if 
   there is only a few values that we are interested in, then I would argue 
   that visitor pattern is overkill. YMMV
   5. ProjectUtils class whose sole purpose is to find bunch of files? I 
   really think it is overkill. Ok, there is your comment in there that says 
>    
>    # To do: get project info from @data nodes.

   I assume the whole purpose of this class is just to give data for 
   testing purposes. Am I right?

Above all, I would suggest for any prototyping purposes to avoid classes as 
much as possible. It is very hard (impossible?) to write clean and good 
class structure without much planning in advance which is rarely the case 
when we are prototyping. When we start prototyping we almost never know in 
advance what problems we would encounter during the prototype development. 
Without this knowledge it is very likely that our classes would go wrong. 
And when we start with some classes it is very hard to get rid of them 
without breaking other parts of our code. That causes us to keep bad design 
decisions and cover them with more and more special cases which is receipt 
for disaster or at least for unmaintainable and unreadable code.

If instead we are prototyping with just pure functions it is very easy to 
delete any of them without breaking other parts of code. When prototype 
mature we get more knowledge of given problem and that is the time to 
consider what (if any) class we need to define. My personal experiments 
with the reading and writing external files, for example, have proven that 
using classes for this purpose had worse performance than the solution with 
the pure functions. I was able to try and compare both solutions after I 
had already written all necessary functions. When we have solution in such 
standalone functions it is easy to make those functions methods of any 
class, but it is hard to do the opposite. 

If there was a bug discovered somewhere in leoCheck.py, lets say after six 
months from now, my guess is that it would be fixed by introducing new 
checks and guards against some special corner cases. I would be greatly 
surprised if fixing such bug would result in reduced complexity. More 
likely it would add some more complexity to the code. After several such 
hypothetical fixes nobody would be able to understand it again, or it would 
require too much time to understand that nobody would have enough time to 
try. If leoCheck.py were containing only bunch of standalone functions it 
would always easy to replace any of them with the new standalone function 
without fear that anything would be broken. It would be easy to add new 
function at any time.

At the end I apologize once again for these comments. My intention is not 
to be offensive. I had suffered from the very same habit of writing code 
that seems simple to me for as long as I can bare all of it in my mind, and 
after some time I can't figure how it was supposed to work at all. I am 
still not completely free of it. I have made too many rewrites from scratch 
many of my projects. I wish I knew this from the beginning. It is almost 
impossible to get class design in the first attempt right. It is hard to 
get functions designed right in the first attempt, too. But prototyping 
with functions gives me freedom to abandon every bad attempt without worry. 

Vitalije

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/leo-editor.
For more options, visit https://groups.google.com/d/optout.

Re: ENB: What's next for check-conventions

Reply via email to