Re: [caiman-discuss] Data Object Cache code review.

Darren Kenny Mon, 26 Jul 2010 08:26:07 -0700

Hi Keith,

On 07/23/10 11:47 PM, Keith Mitchell wrote:
> Hi Darren,
>
> Responses inline. Will there be a differential and/or v2 webrev available?


Posted an updated webrev this morning - didn't get the time on Friday to do
it.

I've cut out comment/history to try make this easier to read ;)

> On 07/23/10 09:02 AM, Darren Kenny wrote:
>> On 07/22/10 01:04 AM, Keith Mitchell wrote:
>>> Hi Darren,
>>>
...
>> I'm also not sure about putting everything in __init__.py - I've not seen any
>> precedent to do that with any other modules, but that doesn't rule it out, 
>> but
>> I'm wondering then if I should just put all three files in there, not just
>> DataObject.py.
>>
>
> There's precedent in several modules in the Python standard library. For
> example, see /usr/lib/python2.6/logging/__init__.py (also the json,
> ctypes, curses modules, to name a few more).

Ah, didn't see these.

>> So I suppose to achieve this I would have to put everything into __init__.py,
>> I just am not a fan of the "one big file" for all sources - it's harder to
>> maintain, and gets messy over time.
>>
>> How would people feel about doing this in general?
>>
>
> Yeah, I agree that "one big file" is not a terribly clean approach.
>
> Some options to consider:
>
> solaris_install.data_object # For DataObject
> solaris_install.data_object.cache
> solaris_install.data_object.dict
>
> Or,
>
> solaris_install.cache # DataObject AND DataObjectCache in __init__
> solaris_install.cache.dict
> solaris_install.cache.some_other_subclass # Other subclasses of
> DataObject get separate files
>
> I like the first one better, because the names read really well when
> compared to the names of the classes that will be getting imported. Of
> course, open to other options as well.
>

I think I like the first one best too - most consistent with the class names-
will give this a go...

...

>
> Ah you're right, didn't quite think that through all the way. Uniqueness
> doesn't rule it out (again, the value of the dictionary key-value pair
> could be a list of all objects stored with that key), but checkpoints
> are keyed by unique names rather than listed under the same key.
>
> Makes me wish for the OrderedDict data structure in Python2.7.

Didn't know about that one - must keep an eye on it :)

...
>>> 264: It might be worth it to wrap a call to get_children(name,
>>> class_type) and then just return the 1st item from that list (if not,
>>> simply return child - no need to have a "found_child" variable).
>>> Additionally, it seems more likely that a call to get_first_child would
>>> raise an ObjectNotFound exception than a call to get_children. Finally,
>>> it could be argued that not providing a class or name parameter to this
>>> function is not particularly valuable (depending on your thoughts on
>>> whether ordering really matters for anything other than items with the
>>> same name). If only name is pertinent for ordering, and the data layout
>>> is reorganized to be a dictionary, then this code becomes nearly
>>> trivial: 'return self._children[name][0]'
>>>
>> A call to get_children is potentially more expensive to locate an object that
>> matches criteria - which is why I didn't go that route.
>>
>> As mentioned before ordering is important, and not just by name.
>>
>
> You're correct, the get_children call is far more expensive. I still
> wonder if there's a way to combine the logic - would it be worth the
> effort to have an optional "count" keyword arg to get_children, that
> returns the first "count" children?

That could work, maybe if I added something like:

    get_children(name=None, class_type=None,
        max_count=None)

So if max_count is specified then stop after that number has been reached.

I prefer the max prefix to explain more what it does.

As for keeping get_first_child, I will keep this since it's logic is slightly
different in that it's expected to return only 1 object (not a list).

...
>>> 470-481: I'd recommend having separate functions for "insert one" and
>>> "insert all items in an iterable". (See list.append vs. list.extend). It
>>> provides a cleaner disjunction and allows for one to create and insert
>>> "list like" DataObjects.
>>>
>>>
>> Do you mean that I should have two DataObject.insert_children methods - maybe
>> insert_child and insert_children?
>>
>> Do you really think it's likely for there to be a DataObject sub-class that
>> would also be a list (when I say list here, I mean would match
>> isinstance(.., list) - not just allow [] type access).
>>
>
> Consider the reverse - what happens when I pass in a tuple of
> DataObjects? Or any iterable that isn't strictly a subclass of list?

I this case it will throw an exception since we only allow DataObject or
list() to be passed - a tuple isn't supported here.

Do you think it would be better to support any iterable object?

But I still don't see a requirement to have separate methods...

>>
>>> 543, 569: This seem in conflict. Is it an error to call delete when
>>> there's nothing to delete, or isn't it? I don't see a problem with
>>> calling delete and having nothing to delete.
>>>
>> They aren't really in conflict. The first one deletes all children, while the
>> second deletes children with a specific criteria - this one I felt it was
>> useful for the consumer to know if anything matched or not, this is 
>> especially
>> useful for a developer using the API, who may have a typo or logic error.
>>
>> It's the Python equivalent of a boolean_t return in C for many methods...
>>
>
> The flip side is that now it's a programming error to forget to check to
> make sure there are matching items before calling delete.
>
> For example, say you're looking at the code for setting up the partition
> targets when the installer wants to use the whole disk (i.e., blow away
> everything that's already there; either for UI/display purposes or
> during actual installation). With this check, I now have to know the
> current state of the disk - are there already partitions on it before I
> call disk.delete_children(type=Partition)?

The whole point of a try/except is that you can make the call without checking
first - the purpose of the try/except is that the caller of the API can make a
decision on whether this is indeed an issue for them, or not.

It's harder to test if something changed after, should you really want to
know.

>
> Essentially, every call to delete_children will require a prior call to
> get_children to first check that it's safe to perform the action, or be
> encased in a try/except block.

I would expect try/except to be the default, not a call to get_children...

>
> I'd also expect programmers to be passing in classes and names via
> variables/constants (for which a typo is unlikely). Hard-coding the
> string key of an object stored in the DOC at scattered points throughout
> the code is a maintenance issue.

I totally agree - I really detest strings being hard-coded rather than at
least set as a static every one uses. But it is still possible for such a
named object to not be there...

>
> With all that said, the more I think about it, the more I feel it could
> go either way. I will add the caveat that the logic will need to be
> re-worked if it's left as an error, as an ObjectNotFound error won't be
> raised if there are no children (lines 531-533), which is inconsistent.

I still don't see this as an error since there is no criteria specified - I
feel that it's inconsistent to say ObjectNotFound if you never specified
anything to match.

In the case of a "delete all" you really don't care if there ever was anything
there to delete in the first place...

...
>> Now I still have the utility methods copy and deepcopy - do you want me to
>> remove these, I feel there are still useful in the API, but maybe others feel
>> they are not?
>>
>
> I'd prefer they get removed.

OK, will do.

...
>>> data_object_dict.py:
>>>
>>> 69-74: The ValueError thrown by etree.Element provides an almost
>>> identical error string; as such, this function seems unnecessary -
>>> especially since it seems to be used only to protect against sub-classes.
>>>
>> It is very necessary to protect against sub-classes here - since that's the
>> only way that you can change the tag and sub-tag correctly. Hence I still 
>> feel
>> this is necessary, even if all it does is call the etree.Element() method.
>>
>> If you look in the unit tests for this class you will see varied sub-classes,
>> some with invalid tags :)
>>
>
> I'm of the opinion that if a subclass wants to shoot itself in the foot,
> it should be allowed to.
>
> This method for checking has problems, in that the check is done for
> each instance of the class created. That's rather heavyweight for
> something that really doesn't need to happen more than once. Since it
> needs to be validated at runtime, and an object needs to be created for
> it to run the check, I don't see anything wrong with letting the
> developer discover that when they try to run the to_xml method.
>
> If this does stay in, I'd like to see the validate_tag method be
> single_underscored, so that subclasses, if needed, can properly override
> if they need to.

OK, I've removed the __validate_tag call in __init__ leaving it for the to_xml
to generate the exception if the tag isn't valid.

>
>>
>>> 104: There's no need for generate_xml to be a property; a simple
>>> attribute will suffice. No need to assert that it's a bool - any object
>>> in Python can be evaluated in a boolean context.
>>>
>> While what you say about any object being allowed to be evaluated in a 
>> boolean
>> context is true, I don't agree that we should just make it an attribute - I 
>> do
>> think that it's worth ensuring that it is a boolean value for a cleaner API.
>>
>> If this is indeed the preference - to allow anything to be stored - then I'm
>> happy to change it to that, but I feel this should be part of a 'style' guide
>> to be consistent throughout CUD.
>>
>
> It is my preference - if only because I'd rather not have to check to
> make sure that a variable I'm passing in is in fact a strict bool (and
> not "None", for example, which could easily slip in as a "False" value).
>
> Python is not a strictly typed language; adding overly strict type
> enforcement, particularly in the case of bool, will probably cause more
> problems than it solves. Most likely what would end up happening is all
> consumers start casting generate_xml to a bool prior to calling:
> DataObjectDict(the_name, a_dictionary,
> generate_xml=bool(some_val_that_might_not_be_a_bool))
>
> In short, there's no real gain from forcing it to be a strict bool.

Removed this, and the associated tests - generate_xml can now be anything...

>
>>
>>> 174: This "del" call isn't needed.
>>>
>> Done
>>
>>
>>> Tests:
>>> I'm going to be a touch more lenient in terms of style/implementation
>>> here. For reference, can you generate a code coverage report and include
>>> it in your reply to this email? (Your tests seem to run fine via the
>>> slim_test runner in the cud_dc gate, so you should be able to just run
>>> that with a --with-cover option to get the results. Optionally, check
>>> "slim_test -h" for info on how to generate an HTML coverage report, and
>>> include *that* instead of the text version).
>>>
>> Summary coverage HTML attached :)
>>
>
> The coverage scores look great - though in the future, perhaps they
> could just be added to the webrev before uploading. Especially since a
> few of the files are missing.

Hmm, ok - would that be just a matter of copying in to the webrev directory?

Are there 'hooks' that could be plugged into webrev to do this type of thing?

...
> I may not get around to reviewing the remaining test_* files in detail;
> but from everything I've seen so far (including the coverage reports), I
> expect they'll be great. Thanks for setting a great example with the
> unit tests!
>

No problem, I think you've provided some great feedback :)

Thanks,

Darren.
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Re: [caiman-discuss] Data Object Cache code review.

Reply via email to