Re: [caiman-discuss] Data Object Cache code review.

Keith Mitchell Wed, 21 Jul 2010 17:06:16 -0700

On 07/15/10 09:52 AM, Darren Kenny wrote:

Hi,


I would like to get the code review for the Data Object Cache under way. The
webrev is at:

        http://cr.opensolaris.org/~dkenny/install_doc/

Because it's been mainly written in the cud_dc gate, I've pulled out my specific
changes and created another workspace based off slim_source (thanks Karen for
the suggestion) to be able to provide it in isolation from the cud_dc work and
(hopefully) give a more read-able webrev.

Any comments/suggestions are much appreciated.

If at all possible, I would like to get the first round of comments by Friday
next, July 23.

Thanks,

Darren.


Hi Darren,

I tried to avoid repeating Drew's comments. In addition to his generalPEP 8 notes, and my addendum on use scenarios for "is None" (toreiterate: use "if <variable>:" or "if <var> is None:" depending oncontext, but never "if <variable> == None:"). Additionally, I startedthis review before your first round of edits - I'll try to removeanything that was fixed in the intermediate update, but if I've leftsomething in that you've already fixed, or the line numbers are slightlyoff, that's why.

As with my review of the logging service, I'm trying to be extradetailed here since this is a CUD component that will be around for,ideally, a long while.

General: Naming - Looking at the layout of the vendor-packages filesadded, the file/package names seem rather verbose. I suggest the following:


Rename data_object_cache to simply "cache" or "data_cache".

Move all of cache/data_object.py into cache/__init__.py. This allows oneto import the cache using simply:

"import osol_install.cache"

which would be preferably to "importosol_install.data_object_cache.data_object_cache" which is not onlyredundant, but causes python import problems in some cases -specifically, statements along the lines of "importdata_object_cache.xyz" can confuse the import mechanism (does it referto osol_install.doc.xyz, or osol_install.doc.doc.xyz?).

Additionally, regardless of if/how the names change, please use absoluteimports: "import osol_install.data_object_cache.data_object asdata_object" rather than "import data_object" for example.


usr/src/lib/Makefile:
Copyright needs updating.
52: Please alphabetize.

prototype_com:
No need to bother updating

data_object.py:
36, 41, 46: These comments don't add much value

37-47: It would be of value to define a common base exception class forthe DOC, and have these subclass that.

194: Nit: I think this should simply be a method, has_children(), not aproperty. It "feels" more like a computed value rather than an attribute.

207: Having the default for "class_type" be DataObject (or have thefunction set class_type = DataObject if it is passed in as None) wouldsimplify some of the logic on line 250.

242: It'd be simpler to just assert that the length of the returned listwere > 0. However, I'd argue that it's not an error to look forsomething by name or class and not find any - I think returning an emptylist in those scenarios would be appropriate.

243-251: Storing the children in a dictionary (where key = name, andvalue = [list of DataObjects]) rather than a flat list would speed upthe look-ups by name. (Or, storing a set of references in a dictionary,if global ordering is important - however, I wonder how importantordering is for anything other than DataObjects grouped by name?)

259: If this is kept (and for future reference), the trailing '\'characters are unnecessary (within a parentheses), and the casts tostring on line 260 are also unnecessary (the %s action automaticallycalls str() on the object to be formatted). I'd also re-break it morelike this:

raise ObjectNotFound("No matching objects found: name = '%s'"
                     " and class_type = '%s'" %
                     (name, class_type))

264: It might be worth it to wrap a call to get_children(name,class_type) and then just return the 1st item from that list (if not,simply return child - no need to have a "found_child" variable).Additionally, it seems more likely that a call to get_first_child wouldraise an ObjectNotFound exception than a call to get_children. Finally,it could be argued that not providing a class or name parameter to thisfunction is not particularly valuable (depending on your thoughts onwhether ordering really matters for anything other than items with thesame name). If only name is pertinent for ordering, and the data layoutis reorganized to be a dictionary, then this code becomes nearlytrivial: 'return self._children[name][0]'

306: Again, this is redundant code compared to get_children. Inparticular, get_children could simply be a call to get_descendants(...,max_depth=1)

410: LOGGER.exception dumps a stack trace, which may or may not bedesired. I think LOGGER.error() would be better here (something higherup can determine when/where to dump the stack trace - to the log, to theconsole, etc.)

414: Function should check that caller doesn't pass in both before andafter.

439: Set it to "None" to indicate a "not yet determined value". Negativevalues are valid parameters to insert, so setting to "-1" could causetrouble later. (Alternatively, default to 0, and remove lines 467-468)

442-444: Unnecessary - let the __check_object_type call catch this(NoneType is not an instance of DataObject, so __check_object_type willfail).

447, 456: Not strictly needed - if they're not DataObject sub-classes,they won't be children, so the calls at 449/458 will fail.

470-481: I'd recommend having separate functions for "insert one" and"insert all items in an iterable". (See list.append vs. list.extend). Itprovides a cleaner disjunction and allows for one to create and insert"list like" DataObjects.

543, 569: This seem in conflict. Is it an error to call delete whenthere's nothing to delete, or isn't it? I don't see a problem withcalling delete and having nothing to delete.


586: This only removes "self" from local scope - so it's not really needed.

591, 606: I don't believe we ever reached resolution on why thiscouldn't be implemented via the __deepcopy__ and __copy__ specialmethods. It seems rather unintuitive to explicitly avoid them - my firstthought when needing to copy/deepcopy a Python object is going to be to"import copy; copy.deepcopy(my_object)"

See attached tree_class.py to see a tree-like object that takesadvantage of __setstate__ and __getstate__ to copy the right portions ofthe tree and fix things as appropriate. (I didn't verify the shallowcopy specifically, but it shouldn't require much additional work, if any).

In looking at your response to Drew's question in this area, I thinkthis resolves 1 & 2. In regards to 3, any subclass implementors can skipoverriding deepcopy - and if they do override it, like any subclassoverriding a parent definition, it needs to be aware of whether or notit should call super().__deepcopy__() as part of the implementation -that's a common problem with any OO subclass. And I don't believe itwill interfere with the pickling, either.

701-702: In general, I'd discourage having heavyweight functions map toproperties. Attributes and properties should be lightweight and quick toaccess. For example, on line 714 you call self.xml_tree, and assign itto xml. Then, 2 lines later, you use self.xml_tree again, instead ofreferencing the xml variable - causing the entire tree to beregenerated. Changing this to a function would make it obvious that acaller should try to avoid retrieving the value multiple times in a row.

720: Rather than using string concatenation, which is slow, add allitems to a list, and then use "\n".join(the_list). Although since thisrecurses, the performance gain may be minimal.


729, 738: This use of repr() breaks convention. See:
http://docs.python.org/reference/datamodel.html#object.__repr__
http://docs.python.org/reference/datamodel.html#object.__str__
I don't particularly see the value of defining __repr__ for the DOC.

741: The Python collections define class hierarchies such that a mutableclasses inherit from immutable classes. I think we might want the same.In this case, that means having "DataObjectNoChildManipulation" be thebase class, and DataObject be a subclass of that. The bonus there isthat then it's simply a matter of *not* defininginsert_children/delete_children for the base class, and defining it forthe subclass. That would make it more transparent for users of theclasses as well (it's far easier to determine that a method simplydoesn't exist for an object than it is to determine that the methodexists, but will always raise an exception).


data_object_cache.py:
128: Seems like it would be simpler to just check "if __instance is None"

144, 146: Should probably be "single-underscored" private variables. Thename-mangling is more to "hide" from sub-classes, and, should there everbe a subclass of DataObjectCache, I imagine it'd want to access thepersistent and volatile trees. Ditto for __instance -> We wouldn't wanta subclass creating two separate name-mangled instances.

148: This makes it obvious as to why the class hierarchy is the way itis. By the same token, it also makes it obvious that it's not reallypossible to enforce "NoChildManipulation" as such. This block of codecould just as easily be "self._children = [self.__persistent_tree,self.__volatile_tree]", if following the above suggestion for adjustingthe class hierarchy.


177: I'd have expected this to be the "delete()" method.

190: This is a good candidate for a property ("self.empty")

229: The isinstance check is not needed - all python objects areinstances of "object".

238: I think we'd want to close regardless. I'm not sure the advantageof leaving this file object open for writing. Closing it prevents thecaller from trying to write garbage to the end of our pickled DOC.

241: Nit: I find this method name a touch verbose. "load" seems fine tome, but it's a mere suggestion and completely up to you.

293: In opposition to Drew's complaints about "cls", PEP8 suggests using"cls" as the name for variables representing classes:

  - If your public attribute name collides with a reserved keyword, append
         a single trailing underscore to your attribute name.  This is
         preferable to an abbreviation or corrupted spelling.  (However,
         notwithstanding this rule, 'cls' is the preferred spelling for any
         variable or argument which is known to be a class, especially the
         first argument to a class method.)


And "requires" it for classmethods:

       Always use 'cls' for the first argument to class methods.

(Sorry Drew!)

295: "del" doesn't actually "free" anything here. It simply removes thatone reference to that object from the scope. As an example, consider:

>>> a = object()
>>> b = a
>>> a
<object object at 0x8080480>
>>> b
<object object at 0x8080480>
>>> del a
>>> a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
>>> b
<object object at 0x8080480>

To truly free an object, *all* references to it need to be "lost," whichin this case will happen automatically once the function returns.


338: Recommend appending to a list and then using "\n".join(the_list)

data_object_dict.py:

69-74: The ValueError thrown by etree.Element provides an almostidentical error string; as such, this function seems unnecessary -especially since it seems to be used only to protect against sub-classes.

104: There's no need for generate_xml to be a property; a simpleattribute will suffice. No need to assert that it's a bool - any objectin Python can be evaluated in a boolean context.


174: This "del" call isn't needed.

Tests:

I'm going to be a touch more lenient in terms of style/implementationhere. For reference, can you generate a code coverage report and includeit in your reply to this email? (Your tests seem to run fine via theslim_test runner in the cud_dc gate, so you should be able to just runthat with a --with-cover option to get the results. Optionally, check"slim_test -h" for info on how to generate an HTML coverage report, andinclude *that* instead of the text version).


test_data_object_cache_children.py:

39 1/2, elsewhere in this file and others: To eliminate odd issuesbetween tests, you should reset the DOC instance as part of the tearDownmethod, by setting DataObjectCache._DataObjectCache__instance = None

53-57, elsewhere: Use "self.assertRaises(<Error>, <function>, [arg1 [,arg2, ...]])" instead of a try/except block


35: Could simply use the "object" class.

80: Typo: Priority should be 100 here, based on test name / failure message.

EOF: Missing test-cases:
register_class: Assigning multiple classes the same priority level

find_class_to_handle: Need case to ensure that if two classes can bothhandle something, the one with the higher priority is chosen. (Create 2classes whose "can_handle" method always return True to test this in thesimplest fashion possible)find_class_to_handle: Case to ensure that if no classes are found, Noneis returned.


test_data_object_cache_snapshots.py:

I'd imagine that only one test will truly require a self.temp_dir/file -there should only be a need to explicitly test against the filesystemonce. For the rest, comparison of StringIO objects should suffice.

I'd suggest moving the DOC generation to a separate function that can becalled explicitly by tests that need a complex DOC (it shouldn't bestrictly required for more than one or two tests). No need to store theself.child_* variables either (and thus, no need to clean them up) sincenone of the tests reference them. This function could be importable byany test files that need to generate a reasonable complex cache.

Most or all of the tests here should both take_snapshot() andload_from_snapshot(), and compare the DOC (persistent) strings frombefore and after.

229: There's no guarantee that I haven't created such a directory on mymachine. This test can probably just be removed - it's not reallytesting DOC as much as it's testing the Python 'open()' built-in.


241: In essence, this tests the same thing as 234.

test_data_object_cache_xml.py:
235: Typo: geneerates -> generates

test_data_object_deletion.py:
95: Typo in fail message (should be child_2)

114-115: I'd check for assertNotEqual(child, self.child_3) to beslightly more precise here.

I've run out of energy to read through test code, and the tests so farlook really good, so I'll wait for the coverage report to finish looking.


- Keith

class TreeClass(object):
    def __init__(self):
        self.parent = None
        self.child = None
    def __getstate__(self):
        state = self.__dict__
        state['parent'] = None
        return state
    def __setstate__(self, state):
        self.__dict__ = state
        if self.child:
            self.child.parent = self

if __name__ == '__main__':
    a = TreeClass()
    b = TreeClass()
    c = TreeClass()

    a.child = b
    b.child = c
    c.parent = b
    b.parent = a

    import copy
    copy_a = copy.deepcopy(a)
    copy_b = copy.deepcopy(b)
    copy_c = copy.deepcopy(c)

_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Re: [caiman-discuss] Data Object Cache code review.

Reply via email to