>
>
> *Maintain code simplicity!*
>
> Without any wish to degrade your work on this project, I feel strong need
to suggest some ideas. Feel free to ignore them if you find them
unappealing.
This morning I checked what you have done so far and I was unpleasantly
surprised with the complexity of code, and yet you wrote:
> Otoh, the infrastructure is simple and good. Speed is good enough. The new
> Stats class is much more clever than before.
I believe that it looks simple in your mind while it is still fresh. But in
mind of a programmer who reads it for the first time it is very hard to
understand. I guess you would feel the same when you return to this code
after several months from now.
First of all, before I looked in the code, I was expecting to see something
like a function that takes a filename or its content as a string and a list
of conventions (possible some flags too), and returns a generator of all
violations in a form (line number, line contents, ...). Instead I have
found 7+ classes.
Here are some points I collected while exploring leoCheck.py:
1. ConventionChecker defines class field patterns as empty list and then
later throughout the definition of class patterns are appended to this
field. Value of this field is used only once in method check_helper
def check_helper(self, fn, node, s):
trace = False and self.enable_trace
trace_source = False
self.file_name = fn
dispatch = {
'call': self.do_call,
'class': self.do_class,
'def': self.do_def,
'self.x=': self.do_assn_to_self,
'c.x=': self.do_assn_to_c,
}
s1 = leoAst.AstFormatter().format(node)
for n in (1, 2):
self.enable_trace = n == 2
self.class_name = None
self.pass_n = n
if trace:
print('===== pass: %s' % n)
if trace_source and n == 2:
print('===== source')
print(s1)
print('----- end source')
for n, s in enumerate(g.splitLines(s1)):
self.line_number = n
for kind, pattern in self.patterns:
m = pattern.match(s)
if m:
f = dispatch.get(kind)
f(kind, m, s)
self.start_class()
self.end_program()
Now, when someone new to this code reads it, it is not clear where to
find what is in self.patterns. It is obvious that self.patterns contains
pairs (kind, pattern), but it is far from obvious where those pairs come
from. Wouldn't be much clearer to explicitly spell all those pairs in `for`
statement. First part of each pair `kind` has sole purpose to connect
through `dispatch` pattern with the corresponding method. That is another
unnecessary indirection that just adds to the complexity. Wouldn't be much
simpler to have dispatch definition contains pairs of (regex pattern,
method) or if kind is really necessary for something else then dispatch can
have a form {pattern: (method, kind)}. Even further it is somewhat
misleading to iterate over patterns for each line of source code, when in
fact it is possible only one pattern to match a single line. Wouldn't be
much more readable to have a sequence of `if elif elif elif elif....else`
and get rid of patterns, dispatch and `for` statement. Additionally, we
could have a helper function:
def check_pattern(line, pat, func):
m = pat.match(line)
if m:
func(m, line)
return True
for n, s in enumerate(g.splitLines(s1)): self.line_number = n
check_pattern(s, assn_to_c_pattern, self.do_assn_to_c) or \
check_pattern(s, assn_to_self_pattern, self.do_assn_to_self) or \
check_pattern(s, call_pattern, self.do_call) or \ check_pattern(s,
class_pattern, self.do_class) or \ check_pattern(s, def_pattern,
self.do_def)
Imagine this snippet inside check_helper method. Isn't it simpler? More
readable? Going a little bit further, methods do_... can be also defined
inside check_helper method. Three of them have guard `if self.pass_n == 1`
and pass_n is set inside check_helper. Were this three methods functions
defined inside check_helper, they would have direct access to pass_n
variable. do_class is in fact equivalent with start_class and the only
thing start_class do is to set self.class_name and a default dict for
self.classes[self.class_name] (it really would be better to use
self.classes=collections.defaultdict({'ivars':{}, 'methods':{}})
class_name ivar is used in other do_... methods which means if
start_class was replaced with the simple assignment to a local variable
inside check_helper do_ functions could have direct access to it or it
could be passed as argument where needed. Given the fact that some of this
methods are used only in first pass and do_call is used in both passes it
would make sens to get rid of the outer loop also and just have two
different functions `do_pass_one` and `do_pass_two` which would call those
functions that are meant to run in given pass.
2. content of file being checked is parsed into ast node, and then that
ast node is formatted again to string to be analyzed further. It involves
call to leoAst.AstFormater which uses visitor pattern. It is far from clear
what is happening. It took me quite a while to figure that all this code
have just a simple purpose to clean a bit source code. Wouldn't be much
easier to read and understand if those two phases were gathered inside one
function named `clean_source`:
def clean_source(code):
node = ast.parse(code, '...', 'exec')
return leoAst.AstFormater().format(node)
I still don't know what is cleaned by this code. If only comments then
perhaps better name would be clean_comments.
3. Actually inside start_class method there is also a guard `if
self.pass_n==1` which means it should also be called only in the first
pass. Furthermore, this function is called from two places from
check_helper at the end of single pass without argument and from do_class
method with regex match as argument. Inside of this method there is check
if the function has been called without argument or not. It would be much
easier to read and understand if this function is divided in two separate
functions. The net effect of calling this method without argument is
setting self.class_name to None, which means self.start_class() can be
replaced with `self.class_name = None`. Isn't it much readable? If called
with the argument, it increases self.stats.classes by 1. However it does it
in both passes which would suggest that self.stats.classes would be equal
to twice the number of class definitions at the end of second pass. (I
haven't tested this but it seems to me likely to be true).
4. Is class Pass1 used at all? I couldn't find where? And it is very
complex class to understand. Perhaps visitor pattern is easy to write, but
to read and understand it could be very challenging. Considering that code
is usually written once and read many times, I would use it only when there
is no other alternative. Almost always same effect can be achieved with a
dictionary explicitly written that has names as keys and functions as
values. If methods called from visit method change some ivars on self, and
whole traversal is meant just to collect few ivar values that way, it is
much more readable if tree is explicitly searched for interesting values
and results yielded or collected, aggregated or whatever needs to be done.
I know that any code dealing with syntax trees is most likely written using
visitor pattern, and it may look like a suitable pattern to use. But if
there is only a few values that we are interested in, then I would argue
that visitor pattern is overkill. YMMV
5. ProjectUtils class whose sole purpose is to find bunch of files? I
really think it is overkill. Ok, there is your comment in there that says
>
> # To do: get project info from @data nodes.
I assume the whole purpose of this class is just to give data for
testing purposes. Am I right?
Above all, I would suggest for any prototyping purposes to avoid classes as
much as possible. It is very hard (impossible?) to write clean and good
class structure without much planning in advance which is rarely the case
when we are prototyping. When we start prototyping we almost never know in
advance what problems we would encounter during the prototype development.
Without this knowledge it is very likely that our classes would go wrong.
And when we start with some classes it is very hard to get rid of them
without breaking other parts of our code. That causes us to keep bad design
decisions and cover them with more and more special cases which is receipt
for disaster or at least for unmaintainable and unreadable code.
If instead we are prototyping with just pure functions it is very easy to
delete any of them without breaking other parts of code. When prototype
mature we get more knowledge of given problem and that is the time to
consider what (if any) class we need to define. My personal experiments
with the reading and writing external files, for example, have proven that
using classes for this purpose had worse performance than the solution with
the pure functions. I was able to try and compare both solutions after I
had already written all necessary functions. When we have solution in such
standalone functions it is easy to make those functions methods of any
class, but it is hard to do the opposite.
If there was a bug discovered somewhere in leoCheck.py, lets say after six
months from now, my guess is that it would be fixed by introducing new
checks and guards against some special corner cases. I would be greatly
surprised if fixing such bug would result in reduced complexity. More
likely it would add some more complexity to the code. After several such
hypothetical fixes nobody would be able to understand it again, or it would
require too much time to understand that nobody would have enough time to
try. If leoCheck.py were containing only bunch of standalone functions it
would always easy to replace any of them with the new standalone function
without fear that anything would be broken. It would be easy to add new
function at any time.
At the end I apologize once again for these comments. My intention is not
to be offensive. I had suffered from the very same habit of writing code
that seems simple to me for as long as I can bare all of it in my mind, and
after some time I can't figure how it was supposed to work at all. I am
still not completely free of it. I have made too many rewrites from scratch
many of my projects. I wish I knew this from the beginning. It is almost
impossible to get class design in the first attempt right. It is hard to
get functions designed right in the first attempt, too. But prototyping
with functions gives me freedom to abandon every bad attempt without worry.
Vitalije
--
You received this message because you are subscribed to the Google Groups
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/leo-editor.
For more options, visit https://groups.google.com/d/optout.