Re: [caiman-discuss] Request for review of Data Object Cache design document (0.5)

Keith Mitchell Tue, 25 May 2010 12:01:19 -0700

Hi Darren,

Responses below.


On 05/25/10 05:32 AM, Darren Kenny wrote:

Hi Keith,

Comments below...

On 05/24/10 10:14 PM, Keith Mitchell wrote:

Hi Darren,

Overall, the design looks really good. I have a few minor questions
about function names and "fitting in" with Python, and less minor
comments about can_handle and from_xml.

3.3: Can you clarify the reasons for the use of copy() and deepcopy()
instead of the Python standard __copy__() and __deepcopy__()?

The main reason for this is that I wanted to have more control over the copy
from DataObject, and simply provide it as part of the object itself rather
than expecting people to do other things like using the "copy" module
themselves.

But one other point is that if you simply call copy.deepcopy() to copy the
tree, then you end up copying the WHOLE tree, since deepcopy recurses back up
the tree using the parent reference...

copy.copy() uses an object's __copy__() method, and similarly,copy.deepcopy() uses the __deepcopy__() method. I'd anticipate that onecould define those methods in such a way to get the desired behavior,unless I misunderstand the problem or the copy module.

This is messy to avoid since I do want __deepcopy__() to maintain the process
of copying the parent reference, but if I exclude it from the copy mechanism,
then it won't. As such, the DataObject.deepcopy() method does a copy of itself
and then a copy.deepcopy() of the children to avoid this recursion back up the
tree.

Why would a deepcopy have the same parent? By the definition ofdeepcopy, I'd expect the resultant object to have no relation to theoriginal object. I'd actually expect it to have no parent at all, andneed to be explicitly added back to the DOC (or added as a child to anobject in the DOC) in order to re-establish a parent.

Side-note on properties: I meant to bring this up during prototyping,
but I didn't quite get around to it. A simple attribute does not need to
be defined as a property initially, but can simply be an attribute until
a later point in time when the property concept is needed. As such, does
"name" need to be a property?

The main reason that it's a property, is that I want it to be read-only, if I
have people accessing it as an attribute then I cannot enforce it being
read-only.

Aside - I'm really against people directly accessing attributes of an object
directly (maybe it's the Java/C++ developer in me - or maybe it's just me
being a control-freak... ;) ).


It's a Java/C++ developer thing - break that habit ;)

Seriously, in Python, if an attribute is read/write, then it should be astraight attribute. However, since name is read-only, having it as aproperty makes sense.

3.4.1.2: I'm not sure there's necessarily a need for both an "insert
before" and an "insert after" method. One should be sufficient?
Comparing this to 'simple' lists in Python, for example - most simply
have an "append()" and "insert()" method.

Yes, but in that case you would have to be directly manipulating the children
property - knowing that it's a list.

But, in my experience it's not good to expose the implementation of an object
to the consumers, as such I don't want people relying on the children being a
list - hence I like to hide that fact that it is - for support purposes this
leaves it open for future changes to the internal implementation without
effecting consumers of the API.

Actually, I wasn't implying that literally, simply as a point ofcomparison. I'm simply wondering about a need for *both* an "insertbefore" and an "insert after" method.

copying - Can you clarify what you mean by the "parent will be cleared"?
It sounds as if the copy()/deepcopy() methods are more akin to a "move"
operation.

In a way they are - if you take a copy of a node, it should be an independent
copy - meaning that it's not tied to the parent, or it's children either (the
latter being a deepcopy()).

If you don't do this then you would be incorrectly assuming that the parent of
the node is correct, when there is no reciprocal relationship between the
parent and the child.

I also don't believe that the term move is correct, since the original node is
still in place.

That last sentence is what I was confused about. The wording at firstmade it sound like the original node would be deleted.

This also seems the opposite of what was stated above about a deepcopyretaining a reference to the parent.

3.4.1.3

to_xml: What's the reasoning behind not having the object itself make
the recursive call into to_xml()? In particular, the restriction it
creates - "it is possible... for the to_xml() method to generate the
complete tree... In this case the child objects' to_xml() methods must
return None" - seems very unintuitive.

The main reason is to remove the burden of it for the implementors.

But as in my response to Dave Miner, yesterday, it does make sense to have a
mechanism to prevent further recursion - so that the parent node can make the
decision to prevent it if desired - but by default it would descend the tree.

A 'for' loop around its children doesn't seem terribly burdensome; onthe other hand the additional complexity of supporting two options hereseems like a hassle to me.

If there's a need to support two paths, splitting up the functions wouldbe smoother:


def to_xml():
    '''Never overridden'''
    my_xml = self.as_xml()
    my_xml.append(self.children_as_xml())
    return my_xml

def as_xml():

'''Abstract method; subclasses return an XML object representingitself'''

    ...

def children_as_xml():

'''Only overridden by a subclass if it wants to take control ofadding its children to the XML'''

    children_xml = SomeXMLObject()
    for child in self.children:

children_xml.append(child.to_xml()) # Note this is a call into'to_xml', not 'as_xml'

    return children_xml

Splitting it up means that the subclasses have the option forcontrolling, but don't have to worry about anything if they don't chooseto go that route. This is just one pseudo-codish example, of course - itcould be designed so that the opposite is the default as well. (Irealize this pseudo-code is probably not crystal clear; I'd be happy toclarify or add detail individually)

can_handle: I can imagine that the evaluation process this would require
could get rather lengthy. There's also concern about
complexity/confusion down the line given that can_handle() could be
evaluating more than simply a high level tag. Would restraining it to
simply the top level tag and its attributes could ensure uniqueness? If
so, that could at least simplify things slightly. At minimum, I think
this setup might require some concept of ordering amongst classes, such
that if two classes happen to non-uniquely decide upon ownership of a
certain node, we can still get a consistent result back.

I don't believe that the evaluation *should* be all that expensive - in the
majority of cases it's going to be a 1-1 mapping between a tag and a class
than can handle it, without a need to look further down the tree.

As a possible way of making it simpler for the consumer, we could look at a
"matching" mechanism similar to XPath - but I really think that's over-kill
here.


Agreed.

Certainly ordering may be worth considering if people think it's required, but
again, I think it's just as simple to be more specific about what you match -
or make the other one less-specific.

For the simple case of the high level tag, maybe a couple attributes,it's simple. But if the intent is to ever support anything more complex,I think we'll want a way to be able to explicitly determine theevaluation order, so that we can ensure consistent results.

This could be as simple as a number assigned when doing the registering of a
class with the DOC.


Yup, exactly.

from_xml:
Nits on the pseudo-code: 4th line in should be "from_xml"

static vs dynamic: I believe the best solution would be a static
situation, and then simply require that any caller into this function be
sure to register all needed classes prior to calling from_xml(). For
example, ManifestParser could read in an XML, recognize that the schema
is for an AI schema, import all necessary AI related classes, then load
the DOC from the XML DOM. The import of the classes could easily be
set-up to trigger registration with the cache, as was hinted at in this
section.

Heh, you suggest static, Dave Miner suggests dynamic ;)

This is part of the reason that I've not yet decided since I think it would
warrant it's own discussion...

That was a typo - I meant 'dynamic,' as implied by the context. Theimportant piece is that it's up to the caller to ensure that allrequired classes get imported or registered in that dynamic fashion.

3.4.2.4:

dump: Would the __str__ or __repr__ functions provide sufficient
functionality here? Or does the indented parameter imply otherwise?

We're using str for each DataObject, so you're suggesting that simply
overriding __str__() to recurse the tree - I think that would be do-able,
we had originally avoided that because it again placed the burden on a class
that overrides __str__() to ensure it recurses down the tree - which wouldn't
be obvious to people to have to do.

I see your point. This places the opposite burden however - a subclassof DataObject can't recurse in its __str__ method, or the dump() outputwill have duplicate data. I'd rather see __str__ recurse, as then I canbe sure that str(<DataObject>), print <DataObject>, and stringformatting of a data object behave in a reasonable manner. However, Iwouldn't be opposed to having __str__ print the object, and __repr__print the object + children.

3.4.3.1
1/2: Since NameValue is simply a single dictionary entry, what is the
value-added of the NameValue class?

Mainly that it's a single object with a name and value - so the XML
representation might be different to that desired for something with multiple
values. But certainly it could be merged and simply behave differently based
on the presence of 1 or more values... I think that would work.

Precisely my thoughts - particularly bearing in mind that the XML treecan deviate to some extent from the DOC tree, correct?


Thanks,
Keith

Thanks,

Darren.


_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Re: [caiman-discuss] Request for review of Data Object Cache design document (0.5)

Reply via email to