On 07/15/10 09:52 AM, Darren Kenny wrote:
Hi,
I would like to get the code review for the Data Object Cache under way. The
webrev is at:
http://cr.opensolaris.org/~dkenny/install_doc/
Because it's been mainly written in the cud_dc gate, I've pulled out my specific
changes and created another workspace based off slim_source (thanks Karen for
the suggestion) to be able to provide it in isolation from the cud_dc work and
(hopefully) give a more read-able webrev.
Any comments/suggestions are much appreciated.
If at all possible, I would like to get the first round of comments by Friday
next, July 23.
Thanks,
Darren.
Hi Darren,
I tried to avoid repeating Drew's comments. In addition to his general
PEP 8 notes, and my addendum on use scenarios for "is None" (to
reiterate: use "if <variable>:" or "if <var> is None:" depending on
context, but never "if <variable> == None:"). Additionally, I started
this review before your first round of edits - I'll try to remove
anything that was fixed in the intermediate update, but if I've left
something in that you've already fixed, or the line numbers are slightly
off, that's why.
As with my review of the logging service, I'm trying to be extra
detailed here since this is a CUD component that will be around for,
ideally, a long while.
General: Naming - Looking at the layout of the vendor-packages files
added, the file/package names seem rather verbose. I suggest the following:
Rename data_object_cache to simply "cache" or "data_cache".
Move all of cache/data_object.py into cache/__init__.py. This allows one
to import the cache using simply:
"import osol_install.cache"
which would be preferably to "import
osol_install.data_object_cache.data_object_cache" which is not only
redundant, but causes python import problems in some cases -
specifically, statements along the lines of "import
data_object_cache.xyz" can confuse the import mechanism (does it refer
to osol_install.doc.xyz, or osol_install.doc.doc.xyz?).
Additionally, regardless of if/how the names change, please use absolute
imports: "import osol_install.data_object_cache.data_object as
data_object" rather than "import data_object" for example.
usr/src/lib/Makefile:
Copyright needs updating.
52: Please alphabetize.
prototype_com:
No need to bother updating
data_object.py:
36, 41, 46: These comments don't add much value
37-47: It would be of value to define a common base exception class for
the DOC, and have these subclass that.
194: Nit: I think this should simply be a method, has_children(), not a
property. It "feels" more like a computed value rather than an attribute.
207: Having the default for "class_type" be DataObject (or have the
function set class_type = DataObject if it is passed in as None) would
simplify some of the logic on line 250.
242: It'd be simpler to just assert that the length of the returned list
were > 0. However, I'd argue that it's not an error to look for
something by name or class and not find any - I think returning an empty
list in those scenarios would be appropriate.
243-251: Storing the children in a dictionary (where key = name, and
value = [list of DataObjects]) rather than a flat list would speed up
the look-ups by name. (Or, storing a set of references in a dictionary,
if global ordering is important - however, I wonder how important
ordering is for anything other than DataObjects grouped by name?)
259: If this is kept (and for future reference), the trailing '\'
characters are unnecessary (within a parentheses), and the casts to
string on line 260 are also unnecessary (the %s action automatically
calls str() on the object to be formatted). I'd also re-break it more
like this:
raise ObjectNotFound("No matching objects found: name = '%s'"
" and class_type = '%s'" %
(name, class_type))
264: It might be worth it to wrap a call to get_children(name,
class_type) and then just return the 1st item from that list (if not,
simply return child - no need to have a "found_child" variable).
Additionally, it seems more likely that a call to get_first_child would
raise an ObjectNotFound exception than a call to get_children. Finally,
it could be argued that not providing a class or name parameter to this
function is not particularly valuable (depending on your thoughts on
whether ordering really matters for anything other than items with the
same name). If only name is pertinent for ordering, and the data layout
is reorganized to be a dictionary, then this code becomes nearly
trivial: 'return self._children[name][0]'
306: Again, this is redundant code compared to get_children. In
particular, get_children could simply be a call to get_descendants(...,
max_depth=1)
410: LOGGER.exception dumps a stack trace, which may or may not be
desired. I think LOGGER.error() would be better here (something higher
up can determine when/where to dump the stack trace - to the log, to the
console, etc.)
414: Function should check that caller doesn't pass in both before and
after.
439: Set it to "None" to indicate a "not yet determined value". Negative
values are valid parameters to insert, so setting to "-1" could cause
trouble later. (Alternatively, default to 0, and remove lines 467-468)
442-444: Unnecessary - let the __check_object_type call catch this
(NoneType is not an instance of DataObject, so __check_object_type will
fail).
447, 456: Not strictly needed - if they're not DataObject sub-classes,
they won't be children, so the calls at 449/458 will fail.
470-481: I'd recommend having separate functions for "insert one" and
"insert all items in an iterable". (See list.append vs. list.extend). It
provides a cleaner disjunction and allows for one to create and insert
"list like" DataObjects.
543, 569: This seem in conflict. Is it an error to call delete when
there's nothing to delete, or isn't it? I don't see a problem with
calling delete and having nothing to delete.
586: This only removes "self" from local scope - so it's not really needed.
591, 606: I don't believe we ever reached resolution on why this
couldn't be implemented via the __deepcopy__ and __copy__ special
methods. It seems rather unintuitive to explicitly avoid them - my first
thought when needing to copy/deepcopy a Python object is going to be to
"import copy; copy.deepcopy(my_object)"
See attached tree_class.py to see a tree-like object that takes
advantage of __setstate__ and __getstate__ to copy the right portions of
the tree and fix things as appropriate. (I didn't verify the shallow
copy specifically, but it shouldn't require much additional work, if any).
In looking at your response to Drew's question in this area, I think
this resolves 1 & 2. In regards to 3, any subclass implementors can skip
overriding deepcopy - and if they do override it, like any subclass
overriding a parent definition, it needs to be aware of whether or not
it should call super().__deepcopy__() as part of the implementation -
that's a common problem with any OO subclass. And I don't believe it
will interfere with the pickling, either.
701-702: In general, I'd discourage having heavyweight functions map to
properties. Attributes and properties should be lightweight and quick to
access. For example, on line 714 you call self.xml_tree, and assign it
to xml. Then, 2 lines later, you use self.xml_tree again, instead of
referencing the xml variable - causing the entire tree to be
regenerated. Changing this to a function would make it obvious that a
caller should try to avoid retrieving the value multiple times in a row.
720: Rather than using string concatenation, which is slow, add all
items to a list, and then use "\n".join(the_list). Although since this
recurses, the performance gain may be minimal.
729, 738: This use of repr() breaks convention. See:
http://docs.python.org/reference/datamodel.html#object.__repr__
http://docs.python.org/reference/datamodel.html#object.__str__
I don't particularly see the value of defining __repr__ for the DOC.
741: The Python collections define class hierarchies such that a mutable
classes inherit from immutable classes. I think we might want the same.
In this case, that means having "DataObjectNoChildManipulation" be the
base class, and DataObject be a subclass of that. The bonus there is
that then it's simply a matter of *not* defining
insert_children/delete_children for the base class, and defining it for
the subclass. That would make it more transparent for users of the
classes as well (it's far easier to determine that a method simply
doesn't exist for an object than it is to determine that the method
exists, but will always raise an exception).
data_object_cache.py:
128: Seems like it would be simpler to just check "if __instance is None"
144, 146: Should probably be "single-underscored" private variables. The
name-mangling is more to "hide" from sub-classes, and, should there ever
be a subclass of DataObjectCache, I imagine it'd want to access the
persistent and volatile trees. Ditto for __instance -> We wouldn't want
a subclass creating two separate name-mangled instances.
148: This makes it obvious as to why the class hierarchy is the way it
is. By the same token, it also makes it obvious that it's not really
possible to enforce "NoChildManipulation" as such. This block of code
could just as easily be "self._children = [self.__persistent_tree,
self.__volatile_tree]", if following the above suggestion for adjusting
the class hierarchy.
177: I'd have expected this to be the "delete()" method.
190: This is a good candidate for a property ("self.empty")
229: The isinstance check is not needed - all python objects are
instances of "object".
238: I think we'd want to close regardless. I'm not sure the advantage
of leaving this file object open for writing. Closing it prevents the
caller from trying to write garbage to the end of our pickled DOC.
241: Nit: I find this method name a touch verbose. "load" seems fine to
me, but it's a mere suggestion and completely up to you.
293: In opposition to Drew's complaints about "cls", PEP8 suggests using
"cls" as the name for variables representing classes:
- If your public attribute name collides with a reserved keyword, append
a single trailing underscore to your attribute name. This is
preferable to an abbreviation or corrupted spelling. (However,
notwithstanding this rule, 'cls' is the preferred spelling for any
variable or argument which is known to be a class, especially the
first argument to a class method.)
And "requires" it for classmethods:
Always use 'cls' for the first argument to class methods.
(Sorry Drew!)
295: "del" doesn't actually "free" anything here. It simply removes that
one reference to that object from the scope. As an example, consider:
>>> a = object()
>>> b = a
>>> a
<object object at 0x8080480>
>>> b
<object object at 0x8080480>
>>> del a
>>> a
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
>>> b
<object object at 0x8080480>
To truly free an object, *all* references to it need to be "lost," which
in this case will happen automatically once the function returns.
338: Recommend appending to a list and then using "\n".join(the_list)
data_object_dict.py:
69-74: The ValueError thrown by etree.Element provides an almost
identical error string; as such, this function seems unnecessary -
especially since it seems to be used only to protect against sub-classes.
104: There's no need for generate_xml to be a property; a simple
attribute will suffice. No need to assert that it's a bool - any object
in Python can be evaluated in a boolean context.
174: This "del" call isn't needed.
Tests:
I'm going to be a touch more lenient in terms of style/implementation
here. For reference, can you generate a code coverage report and include
it in your reply to this email? (Your tests seem to run fine via the
slim_test runner in the cud_dc gate, so you should be able to just run
that with a --with-cover option to get the results. Optionally, check
"slim_test -h" for info on how to generate an HTML coverage report, and
include *that* instead of the text version).
test_data_object_cache_children.py:
39 1/2, elsewhere in this file and others: To eliminate odd issues
between tests, you should reset the DOC instance as part of the tearDown
method, by setting DataObjectCache._DataObjectCache__instance = None
53-57, elsewhere: Use "self.assertRaises(<Error>, <function>, [arg1 [,
arg2, ...]])" instead of a try/except block
35: Could simply use the "object" class.
80: Typo: Priority should be 100 here, based on test name / failure message.
EOF: Missing test-cases:
register_class: Assigning multiple classes the same priority level
find_class_to_handle: Need case to ensure that if two classes can both
handle something, the one with the higher priority is chosen. (Create 2
classes whose "can_handle" method always return True to test this in the
simplest fashion possible)
find_class_to_handle: Case to ensure that if no classes are found, None
is returned.
test_data_object_cache_snapshots.py:
I'd imagine that only one test will truly require a self.temp_dir/file -
there should only be a need to explicitly test against the filesystem
once. For the rest, comparison of StringIO objects should suffice.
I'd suggest moving the DOC generation to a separate function that can be
called explicitly by tests that need a complex DOC (it shouldn't be
strictly required for more than one or two tests). No need to store the
self.child_* variables either (and thus, no need to clean them up) since
none of the tests reference them. This function could be importable by
any test files that need to generate a reasonable complex cache.
Most or all of the tests here should both take_snapshot() and
load_from_snapshot(), and compare the DOC (persistent) strings from
before and after.
229: There's no guarantee that I haven't created such a directory on my
machine. This test can probably just be removed - it's not really
testing DOC as much as it's testing the Python 'open()' built-in.
241: In essence, this tests the same thing as 234.
test_data_object_cache_xml.py:
235: Typo: geneerates -> generates
test_data_object_deletion.py:
95: Typo in fail message (should be child_2)
114-115: I'd check for assertNotEqual(child, self.child_3) to be
slightly more precise here.
I've run out of energy to read through test code, and the tests so far
look really good, so I'll wait for the coverage report to finish looking.
- Keith
class TreeClass(object):
def __init__(self):
self.parent = None
self.child = None
def __getstate__(self):
state = self.__dict__
state['parent'] = None
return state
def __setstate__(self, state):
self.__dict__ = state
if self.child:
self.child.parent = self
if __name__ == '__main__':
a = TreeClass()
b = TreeClass()
c = TreeClass()
a.child = b
b.child = c
c.parent = b
b.parent = a
import copy
copy_a = copy.deepcopy(a)
copy_b = copy.deepcopy(b)
copy_c = copy.deepcopy(c)
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss