Re: [caiman-discuss] Request for review of Data Object Cache design document (0.5)

Darren Kenny Tue, 25 May 2010 05:33:42 -0700

Hi Keith,

Comments below...

On 05/24/10 10:14 PM, Keith Mitchell wrote:
> Hi Darren,
>
> Overall, the design looks really good. I have a few minor questions
> about function names and "fitting in" with Python, and less minor
> comments about can_handle and from_xml.
>
> 3.3: Can you clarify the reasons for the use of copy() and deepcopy()
> instead of the Python standard __copy__() and __deepcopy__()?

The main reason for this is that I wanted to have more control over the copy
from DataObject, and simply provide it as part of the object itself rather
than expecting people to do other things like using the "copy" module
themselves.

But one other point is that if you simply call copy.deepcopy() to copy the
tree, then you end up copying the WHOLE tree, since deepcopy recurses back up
the tree using the parent reference...

This is messy to avoid since I do want __deepcopy__() to maintain the process
of copying the parent reference, but if I exclude it from the copy mechanism,
then it won't. As such, the DataObject.deepcopy() method does a copy of itself
and then a copy.deepcopy() of the children to avoid this recursion back up the
tree.

>
> Side-note on properties: I meant to bring this up during prototyping,
> but I didn't quite get around to it. A simple attribute does not need to
> be defined as a property initially, but can simply be an attribute until
> a later point in time when the property concept is needed. As such, does
> "name" need to be a property?

The main reason that it's a property, is that I want it to be read-only, if I
have people accessing it as an attribute then I cannot enforce it being
read-only.

Aside - I'm really against people directly accessing attributes of an object
directly (maybe it's the Java/C++ developer in me - or maybe it's just me
being a control-freak... ;) ).

>
> 3.4.1.2: I'm not sure there's necessarily a need for both an "insert
> before" and an "insert after" method. One should be sufficient?
> Comparing this to 'simple' lists in Python, for example - most simply
> have an "append()" and "insert()" method.

Yes, but in that case you would have to be directly manipulating the children
property - knowing that it's a list.

But, in my experience it's not good to expose the implementation of an object
to the consumers, as such I don't want people relying on the children being a
list - hence I like to hide that fact that it is - for support purposes this
leaves it open for future changes to the internal implementation without
effecting consumers of the API.

>
> copying - Can you clarify what you mean by the "parent will be cleared"?
> It sounds as if the copy()/deepcopy() methods are more akin to a "move"
> operation.

In a way they are - if you take a copy of a node, it should be an independent
copy - meaning that it's not tied to the parent, or it's children either (the
latter being a deepcopy()).

If you don't do this then you would be incorrectly assuming that the parent of
the node is correct, when there is no reciprocal relationship between the
parent and the child.

I also don't believe that the term move is correct, since the original node is
still in place.

>
> 3.4.1.3
>
> to_xml: What's the reasoning behind not having the object itself make
> the recursive call into to_xml()? In particular, the restriction it
> creates - "it is possible... for the to_xml() method to generate the
> complete tree... In this case the child objects' to_xml() methods must
> return None" - seems very unintuitive.

The main reason is to remove the burden of it for the implementors.

But as in my response to Dave Miner, yesterday, it does make sense to have a
mechanism to prevent further recursion - so that the parent node can make the
decision to prevent it if desired - but by default it would descend the tree.

>
> can_handle: I can imagine that the evaluation process this would require
> could get rather lengthy. There's also concern about
> complexity/confusion down the line given that can_handle() could be
> evaluating more than simply a high level tag. Would restraining it to
> simply the top level tag and its attributes could ensure uniqueness? If
> so, that could at least simplify things slightly. At minimum, I think
> this setup might require some concept of ordering amongst classes, such
> that if two classes happen to non-uniquely decide upon ownership of a
> certain node, we can still get a consistent result back.

I don't believe that the evaluation *should* be all that expensive - in the
majority of cases it's going to be a 1-1 mapping between a tag and a class
than can handle it, without a need to look further down the tree.

As a possible way of making it simpler for the consumer, we could look at a
"matching" mechanism similar to XPath - but I really think that's over-kill
here.

Certainly ordering may be worth considering if people think it's required, but
again, I think it's just as simple to be more specific about what you match -
or make the other one less-specific.

This could be as simple as a number assigned when doing the registering of a
class with the DOC.

>
> from_xml:
> Nits on the pseudo-code: 4th line in should be "from_xml"
>
> static vs dynamic: I believe the best solution would be a static
> situation, and then simply require that any caller into this function be
> sure to register all needed classes prior to calling from_xml(). For
> example, ManifestParser could read in an XML, recognize that the schema
> is for an AI schema, import all necessary AI related classes, then load
> the DOC from the XML DOM. The import of the classes could easily be
> set-up to trigger registration with the cache, as was hinted at in this
> section.

Heh, you suggest static, Dave Miner suggests dynamic ;)

This is part of the reason that I've not yet decided since I think it would
warrant it's own discussion...

>
> 3.4.2.4:
>
> dump: Would the __str__ or __repr__ functions provide sufficient
> functionality here? Or does the indented parameter imply otherwise?

We're using str for each DataObject, so you're suggesting that simply
overriding __str__() to recurse the tree - I think that would be do-able,
we had originally avoided that because it again placed the burden on a class
that overrides __str__() to ensure it recurses down the tree - which wouldn't
be obvious to people to have to do.

>
> 3.4.3.1
> 1/2: Since NameValue is simply a single dictionary entry, what is the
> value-added of the NameValue class?

Mainly that it's a single object with a name and value - so the XML
representation might be different to that desired for something with multiple
values. But certainly it could be merged and simply behave differently based
on the presence of 1 or more values... I think that would work.

Thanks,

Darren.

_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Re: [caiman-discuss] Request for review of Data Object Cache design document (0.5)

Reply via email to