On 07/15/10 09:52 AM, Darren Kenny wrote:
Hi,

I would like to get the code review for the Data Object Cache under way. The
webrev is at:

        http://cr.opensolaris.org/~dkenny/install_doc/

Because it's been mainly written in the cud_dc gate, I've pulled out my specific
changes and created another workspace based off slim_source (thanks Karen for
the suggestion) to be able to provide it in isolation from the cud_dc work and
(hopefully) give a more read-able webrev.

Any comments/suggestions are much appreciated.

If at all possible, I would like to get the first round of comments by Friday
next, July 23.

Thanks,

Darren.


Hi Darren,

I tried to avoid repeating Drew's comments. In addition to his general PEP 8 notes, and my addendum on use scenarios for "is None" (to reiterate: use "if <variable>:" or "if <var> is None:" depending on context, but never "if <variable> == None:"). Additionally, I started this review before your first round of edits - I'll try to remove anything that was fixed in the intermediate update, but if I've left something in that you've already fixed, or the line numbers are slightly off, that's why.

As with my review of the logging service, I'm trying to be extra detailed here since this is a CUD component that will be around for, ideally, a long while.

General: Naming - Looking at the layout of the vendor-packages files added, the file/package names seem rather verbose. I suggest the following:

Rename data_object_cache to simply "cache" or "data_cache".
Move all of cache/data_object.py into cache/__init__.py. This allows one to import the cache using simply:
"import osol_install.cache"
which would be preferably to "import osol_install.data_object_cache.data_object_cache" which is not only redundant, but causes python import problems in some cases - specifically, statements along the lines of "import data_object_cache.xyz" can confuse the import mechanism (does it refer to osol_install.doc.xyz, or osol_install.doc.doc.xyz?).

Additionally, regardless of if/how the names change, please use absolute imports: "import osol_install.data_object_cache.data_object as data_object" rather than "import data_object" for example.

usr/src/lib/Makefile:
Copyright needs updating.
52: Please alphabetize.

prototype_com:
No need to bother updating

data_object.py:
36, 41, 46: These comments don't add much value

37-47: It would be of value to define a common base exception class for the DOC, and have these subclass that.

194: Nit: I think this should simply be a method, has_children(), not a property. It "feels" more like a computed value rather than an attribute.

207: Having the default for "class_type" be DataObject (or have the function set class_type = DataObject if it is passed in as None) would simplify some of the logic on line 250.

242: It'd be simpler to just assert that the length of the returned list were > 0. However, I'd argue that it's not an error to look for something by name or class and not find any - I think returning an empty list in those scenarios would be appropriate.

243-251: Storing the children in a dictionary (where key = name, and value = [list of DataObjects]) rather than a flat list would speed up the look-ups by name. (Or, storing a set of references in a dictionary, if global ordering is important - however, I wonder how important ordering is for anything other than DataObjects grouped by name?)

259: If this is kept (and for future reference), the trailing '\' characters are unnecessary (within a parentheses), and the casts to string on line 260 are also unnecessary (the %s action automatically calls str() on the object to be formatted). I'd also re-break it more like this:
raise ObjectNotFound("No matching objects found: name = '%s'"
                     " and class_type = '%s'" %
                     (name, class_type))

264: It might be worth it to wrap a call to get_children(name, class_type) and then just return the 1st item from that list (if not, simply return child - no need to have a "found_child" variable). Additionally, it seems more likely that a call to get_first_child would raise an ObjectNotFound exception than a call to get_children. Finally, it could be argued that not providing a class or name parameter to this function is not particularly valuable (depending on your thoughts on whether ordering really matters for anything other than items with the same name). If only name is pertinent for ordering, and the data layout is reorganized to be a dictionary, then this code becomes nearly trivial: 'return self._children[name][0]'

306: Again, this is redundant code compared to get_children. In particular, get_children could simply be a call to get_descendants(..., max_depth=1)

410: LOGGER.exception dumps a stack trace, which may or may not be desired. I think LOGGER.error() would be better here (something higher up can determine when/where to dump the stack trace - to the log, to the console, etc.)

414: Function should check that caller doesn't pass in both before and after.

439: Set it to "None" to indicate a "not yet determined value". Negative values are valid parameters to insert, so setting to "-1" could cause trouble later. (Alternatively, default to 0, and remove lines 467-468)

442-444: Unnecessary - let the __check_object_type call catch this (NoneType is not an instance of DataObject, so __check_object_type will fail).

447, 456: Not strictly needed - if they're not DataObject sub-classes, they won't be children, so the calls at 449/458 will fail.

470-481: I'd recommend having separate functions for "insert one" and "insert all items in an iterable". (See list.append vs. list.extend). It provides a cleaner disjunction and allows for one to create and insert "list like" DataObjects.

543, 569: This seem in conflict. Is it an error to call delete when there's nothing to delete, or isn't it? I don't see a problem with calling delete and having nothing to delete.

586: This only removes "self" from local scope - so it's not really needed.

591, 606: I don't believe we ever reached resolution on why this couldn't be implemented via the __deepcopy__ and __copy__ special methods. It seems rather unintuitive to explicitly avoid them - my first thought when needing to copy/deepcopy a Python object is going to be to "import copy; copy.deepcopy(my_object)"

See attached tree_class.py to see a tree-like object that takes advantage of __setstate__ and __getstate__ to copy the right portions of the tree and fix things as appropriate. (I didn't verify the shallow copy specifically, but it shouldn't require much additional work, if any).

In looking at your response to Drew's question in this area, I think this resolves 1 & 2. In regards to 3, any subclass implementors can skip overriding deepcopy - and if they do override it, like any subclass overriding a parent definition, it needs to be aware of whether or not it should call super().__deepcopy__() as part of the implementation - that's a common problem with any OO subclass. And I don't believe it will interfere with the pickling, either.

701-702: In general, I'd discourage having heavyweight functions map to properties. Attributes and properties should be lightweight and quick to access. For example, on line 714 you call self.xml_tree, and assign it to xml. Then, 2 lines later, you use self.xml_tree again, instead of referencing the xml variable - causing the entire tree to be regenerated. Changing this to a function would make it obvious that a caller should try to avoid retrieving the value multiple times in a row.

720: Rather than using string concatenation, which is slow, add all items to a list, and then use "\n".join(the_list). Although since this recurses, the performance gain may be minimal.

729, 738: This use of repr() breaks convention. See:
http://docs.python.org/reference/datamodel.html#object.__repr__
http://docs.python.org/reference/datamodel.html#object.__str__
I don't particularly see the value of defining __repr__ for the DOC.

741: The Python collections define class hierarchies such that a mutable classes inherit from immutable classes. I think we might want the same. In this case, that means having "DataObjectNoChildManipulation" be the base class, and DataObject be a subclass of that. The bonus there is that then it's simply a matter of *not* defining insert_children/delete_children for the base class, and defining it for the subclass. That would make it more transparent for users of the classes as well (it's far easier to determine that a method simply doesn't exist for an object than it is to determine that the method exists, but will always raise an exception).

data_object_cache.py:
128: Seems like it would be simpler to just check "if __instance is None"

144, 146: Should probably be "single-underscored" private variables. The name-mangling is more to "hide" from sub-classes, and, should there ever be a subclass of DataObjectCache, I imagine it'd want to access the persistent and volatile trees. Ditto for __instance -> We wouldn't want a subclass creating two separate name-mangled instances.

148: This makes it obvious as to why the class hierarchy is the way it is. By the same token, it also makes it obvious that it's not really possible to enforce "NoChildManipulation" as such. This block of code could just as easily be "self._children = [self.__persistent_tree, self.__volatile_tree]", if following the above suggestion for adjusting the class hierarchy.

177: I'd have expected this to be the "delete()" method.

190: This is a good candidate for a property ("self.empty")

229: The isinstance check is not needed - all python objects are instances of "object".

238: I think we'd want to close regardless. I'm not sure the advantage of leaving this file object open for writing. Closing it prevents the caller from trying to write garbage to the end of our pickled DOC.

241: Nit: I find this method name a touch verbose. "load" seems fine to me, but it's a mere suggestion and completely up to you.

293: In opposition to Drew's complaints about "cls", PEP8 suggests using "cls" as the name for variables representing classes:

  - If your public attribute name collides with a reserved keyword, append
         a single trailing underscore to your attribute name.  This is
         preferable to an abbreviation or corrupted spelling.  (However,
         notwithstanding this rule, 'cls' is the preferred spelling for any
         variable or argument which is known to be a class, especially the
         first argument to a class method.)

And "requires" it for classmethods:

       Always use 'cls' for the first argument to class methods.
(Sorry Drew!)


295: "del" doesn't actually "free" anything here. It simply removes that one reference to that object from the scope. As an example, consider:
>>> a = object()
>>> b = a
>>> a
<object object at 0x8080480>
>>> b
<object object at 0x8080480>
>>> del a
>>> a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
>>> b
<object object at 0x8080480>

To truly free an object, *all* references to it need to be "lost," which in this case will happen automatically once the function returns.

338: Recommend appending to a list and then using "\n".join(the_list)

data_object_dict.py:

69-74: The ValueError thrown by etree.Element provides an almost identical error string; as such, this function seems unnecessary - especially since it seems to be used only to protect against sub-classes.

104: There's no need for generate_xml to be a property; a simple attribute will suffice. No need to assert that it's a bool - any object in Python can be evaluated in a boolean context.

174: This "del" call isn't needed.

Tests:
I'm going to be a touch more lenient in terms of style/implementation here. For reference, can you generate a code coverage report and include it in your reply to this email? (Your tests seem to run fine via the slim_test runner in the cud_dc gate, so you should be able to just run that with a --with-cover option to get the results. Optionally, check "slim_test -h" for info on how to generate an HTML coverage report, and include *that* instead of the text version).

test_data_object_cache_children.py:
39 1/2, elsewhere in this file and others: To eliminate odd issues between tests, you should reset the DOC instance as part of the tearDown method, by setting DataObjectCache._DataObjectCache__instance = None

53-57, elsewhere: Use "self.assertRaises(<Error>, <function>, [arg1 [, arg2, ...]])" instead of a try/except block

35: Could simply use the "object" class.

80: Typo: Priority should be 100 here, based on test name / failure message.

EOF: Missing test-cases:
register_class: Assigning multiple classes the same priority level
find_class_to_handle: Need case to ensure that if two classes can both handle something, the one with the higher priority is chosen. (Create 2 classes whose "can_handle" method always return True to test this in the simplest fashion possible) find_class_to_handle: Case to ensure that if no classes are found, None is returned.

test_data_object_cache_snapshots.py:

I'd imagine that only one test will truly require a self.temp_dir/file - there should only be a need to explicitly test against the filesystem once. For the rest, comparison of StringIO objects should suffice.

I'd suggest moving the DOC generation to a separate function that can be called explicitly by tests that need a complex DOC (it shouldn't be strictly required for more than one or two tests). No need to store the self.child_* variables either (and thus, no need to clean them up) since none of the tests reference them. This function could be importable by any test files that need to generate a reasonable complex cache.

Most or all of the tests here should both take_snapshot() and load_from_snapshot(), and compare the DOC (persistent) strings from before and after.

229: There's no guarantee that I haven't created such a directory on my machine. This test can probably just be removed - it's not really testing DOC as much as it's testing the Python 'open()' built-in.

241: In essence, this tests the same thing as 234.

test_data_object_cache_xml.py:
235: Typo: geneerates -> generates

test_data_object_deletion.py:
95: Typo in fail message (should be child_2)
114-115: I'd check for assertNotEqual(child, self.child_3) to be slightly more precise here.

I've run out of energy to read through test code, and the tests so far look really good, so I'll wait for the coverage report to finish looking.

- Keith

class TreeClass(object):
    def __init__(self):
        self.parent = None
        self.child = None
    def __getstate__(self):
        state = self.__dict__
        state['parent'] = None
        return state
    def __setstate__(self, state):
        self.__dict__ = state
        if self.child:
            self.child.parent = self

if __name__ == '__main__':
    a = TreeClass()
    b = TreeClass()
    c = TreeClass()

    a.child = b
    b.child = c
    c.parent = b
    b.parent = a

    import copy
    copy_a = copy.deepcopy(a)
    copy_b = copy.deepcopy(b)
    copy_c = copy.deepcopy(c)


_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Reply via email to