Re: [caiman-discuss] Request for review of Data Object Cache design document (0.5)

Darren Kenny Wed, 26 May 2010 05:56:16 -0700

Hi Sarah,

(I've cut some of the text to make things more readable)


On 05/25/10 03:50 PM, Sarah Jelinek wrote:
> On 05/25/10 07:36 AM, Darren Kenny wrote:
>>>>> 2.2, last bullet: Would it more correct to say that it's the job of the
>>>>> application to define the object hierarchy, and hence the data tree, and
>>>>> that DOC is really just providing the infrastructure for the application
>>>>> to do that?  As written the DOC seems to be perhaps more omniscient than
>>>>> you mean for it.
>>>>>
>>>> Well, not really, mainly since it was thought that the DOC could make some 
>>>> use
>>>> of the DTD / Schema (at a high level at least) to correctly structure the 
>>>> XML
>>>> generated from the DOC. At least this is something we discussed with Sarah.
>>>>
>>>> The Application doesn't really know as much about the XML any more, but
>>>> instead this is being deferred to each object in the DOC to know about 
>>>> their
>>>> own area - e.g. Targets know how to map Targets from/to XML, etc.
>>>>
>>>> By the DOC making use of the Schema knowledge it is able to put the XML
>>>> generated by various elements of the DOC into the correct places in the
>>>> Manifest... At least that's the theory...
>>>>
>>>>
>>> So how is the DOC notified of the specific schema it is supposed to use?
>>>    I didn't see a mechanism for that.
>>>
>> Well, it was just going to be implemented to use the latest schema/DTD at the
>> time - do you think we would need to consider supporting older schemas? I
>> think that this would be better serviced by using XSLT outside of it.
>>
> Somewhere we have to understand the schema(s) we are operating under so
> we can successfully dump an AI manifest. The DOC has to have this data
> provided to it. My assumption was that for any run of AI we would have
> validated the AI manifest against a schema and this schema would be used
> to drive the manifest output in the DOC.

That makes sense - to use the same DTD for output, as was used for input.

I'll add something to the DOC to have this specified, most likely as a
parameter to the "generate_manifest_xml()" method.

>>
>>>
>>>> This is largely being facilitated by the move towards DC and AI sharing the
>>>> same Schema, but if there are differences we are considering passing flags 
>>>> to
>>>> the to_xml(), and maybe from_xml() (but I think this should be less 
>>>> necessary)
>>>> to allow for differences e.g.:
>>>>
>>>>       DataObject.to_xml( manifest_type = MANIFEST_TYPE_AI )
>>>>
>>>>
>>> I think that sort of approach has potential of pushing detailed
>>> application-specific knowledge into leaf objects.  That seems
>>> problematic to me.
>>>
>> Hmm, maybe it could have some effect, but it still seems to me to be the
>> correct place to have the decision made since each object would best know
>> where it fits in the over all scheme of things.
>>
>
> I had thought that that the object wouldn't have to know it is
> translating itself to a specific manifest. My thought was that any
> object could translate itself to xml, and that the manifest comes from
> the fact that we have the schema and we know the order in which elements
> and attributes must appear. Is this not possible?

I think this is the direction I'm going now - to have the object simply
generate the XML that represents it, and have the DOC decide what to use based
on the DTD.

My main concern on this is to what level the DTD is used by the DOC - I had
hoped to only require it for high-level structure, not each and every node...
For example, it would work at the first 2 or 3 levels of XML, anything below
that is not controlled by the DOC.

If it has to control each and every node, then it's becoming a lot more
complex to manage multiple DTDs - and there would need to be significant
knowledge about the mapping between each DTD element and a class instance.

>> The only other alternative I can think of is to allow everything to generate
>> XML as if it's going into the overall-schema, and then the DOC later run an
>> XSLT or simlar over the generated XML tree, to produce an AI or DC varient,
>> removing nodes or modifying them as appropriate...
>>
>>
> If the above isn't possible then I think that we have to consider
> something like this. That is every object translates itself and we use
> xslt to transform this data into a valid AI manifest.
>

I think that this might actually be the most flexible approach - allowing for
different XSLTs to be used depending on the desired output.

>>>>> 3.2 bullet 5: Are there things we can do to help object implementers
>>>>> meet the schema consistency and re-creation constraints?
>>>>>
>>>> The hope is that the person implementing a given object, will be the most
>>>> familiar with what the schema dictates for it's own XML equivalent, so it
>>>> shouldn't be that much of an issue if they focus only on their specific 
>>>> area.
>>>>
>>>> Of course that doesn't mean there isn't something we could do to help, but 
>>>> I'm
>>>> honestly not sure what we could provide other than a possible breakdown of 
>>>> the
>>>> Schema for each area / checkpoint.
>>>>
>>>>
>>> I'm concerned that we're going to have developers stumbling around here
>>> trying to figure out how to get it right, just hoping we had a set of
>>> basic practices that would limit that.
>>>
>> Do you have any suggestions on how we might provide such things? I need to
>> think a little more about it before I can come up with something.
>>
>> Certainly we could document some examples, taking a snippet of the schema, 
>> and
>> showing how to generate it in XML from the DOC, would that suffice?
>>
>
> I would think that a developer should be able to take the specific
> object and map it to the portion of the schema that it is part of. The
> way the schemas are defined, "transfer", "target", "execution",
> "configuration" mean that the objects that represent these are contained
> in these schemas and from my perspective it should be relatively
> straightforward to map the object to xml. The developer can also
> validate the xml against the schema, for the specific sub-schema they
> are dumping xml. They would have to of course validate it against
> multiple objects output, in the correct order, but it seems reasonable
> to have them do this since the schemas are going to be modular.

I would think that this would certainly be the general case - where there is
an obvious link between a section of the XML manifest and the implementation
in the DOC - if this isn't the case it could be very confusing to people for
maintain the code.

>
> Providing examples would be helpful, and developers can use the xml
> instance document examples I have in the soon to be released schema
> design document to guide them as well.

We'll work on examples then.

>
> If an object doesn't provide it's own xml translation, the only other
> alternative I can think of is that the DOC knows how to do this, for
> each object that is part of a valid manifest. My concern about having
> the DOC do this is that when things change in the objects, the DOC has
> to track this separately, as opposed to having the developer making the
> changes directly in the object itself.
>

I certainly don't want the DOC having to know this kind of information -
that's the whole point of having each object handle it's own section of XML -
the changes would all be localized, and as such much less likely to get out of
sync.

>>
>>> I'm somewhat doubtful of that suggested taxonomy.  A slice (or
>>> partition) seems dependent on its parent device/partition, so I would
>>> expect the names to be fully-qualified.
>>>
>> I don't believe that would be the case at the moment in the schema design:
>>
>>      ...
>>          <target_device>
>>              <type>
>>                 <ctd>
>>                   <name>c1t0d0</name>
>>                   <slice>
>>                     <name>0</name>
>>                      <action>
>>                      <create>
>>                        <size>1000</size>
>>                      </create>
>>                      </action>
>>                   </slice>
>>                 </ctd>
>>              </type>
>>           </target_device>
>>      ...
>>
>> I think that the fully qualified name is certainly fetch-able (e.g. calling
>> slice.get_device_name() ) but I don't think it should be necessary for a 
>> child
>> node to qualify it's name in itself, as in:
>>
>>       Targets
>>           TargetDisk  [c0d0]
>>               Partition [c0d0p0]
>>               Partition [c0d0p1]
>>                   Slice [c0d0s0]
>>                   Slice [c0d0s1]
>>                   ...
>>                   Slice [c0d0s7]
>>           TargetDisk  [c2d0]
>>               Partition [c2d0p0]
>>                   Slice [c2d0s0]
>>                   Slice [c2d0s1]
>>                   ...
>>                   Slice [c2d0s7]
>>
>> this seems like redundant information being repeated unnecessarily, when it's
>> possible to get it using full_name = name + parent.name ...
>>
>
> The current schemas do not not have children that have fully qualified
> names. The plan moving forward was to keep this the same, and associate
> a child with the parent, and get the full name that way. I agree with
> Darren that storing the info in the child seems redundant.
>
> I do have a question though.. the delete_child() method indicates that
> this will remove a specific parent class. Is this method on the parent
> class so that you know which parent to use?

I'm not 100% sure what the question is here, but I'm thinking you're referring
to the text on page 9, for delete_child(), if not please correct me...

When I say that "delete_child() will remove a specific child from the parent
class where the instances match" what I'm saying is that it will delete the
child from the list of children contained in the object that its called on,
e.g.:

    def delete_child( self, child_to_delete ):
        ...
        for child in self.children:
            if ( child == child_to_delete ):
                # Remove child from self.children

    ...

    my_object.delete_child( old_object )


So in this case my_object is the parent of old_object, and the child of
my_object that matches the instance of "child_to_delete" will be removed -
this is what's compared by default if you compare using "==".


>>
>>>
>>>> I suppose we could ask that the name would be unique in the any given child
>>>> list - but I don't think we could ask for it to be the case in the complete
>>>> tree of objects. This could also open up the ability to refer to children
>>>> using a dictionary, which might be useful...
>>>>
>>>>
>>> That seems like a theoretical use, but it seems to compromise an
>>> immediate, practical use of the Name to allow it, soI guess I'm
>>> skeptical still.
>>>
>> And entitled to be ;)
>>
>>
>>>>> 3.4.1.3 to_xml().  I like the potential choice to have a parent generate
>>>>> a tree for its children, but I'm not sure how a general child class
>>>>> would know to return None if it were implemented to normally provide its
>>>>> own representation; especially if the parent would like to use the
>>>>> child's to_xml() to assist in its aggregation.  Should it perhaps be the
>>>>> case that to_xml() also returns a boolean that indicates whether descent
>>>>> within this object's subtree should continue?  Should this also apply to
>>>>> can_handle()/from_xml() so that the behavior can be fully symmetric?
>>>>>
>>>> This is certainly possible to do. I'm honestly still delving into this 
>>>> area in
>>>> more depth to see what the best solution would be.
>>>>
>>>> But my thinking on it is that if it's likely that the parent object would 
>>>> do
>>>> the XML generation to include it's children, then most probably the case 
>>>> that
>>>> the child wouldn't ever generate XML in itself.
>>>>
>
> I would think that if a parent is generating the xml for itself and its
> children it would still rely on the child to provide its xml
> representation. To aggregate the data into a tree. Is this not possible
> or desirable for some reason? Why would we want the parent to generate
> the xml for itself and its children without traversing the children?

Certainly the preference is that the children would generate their own XML,
but there is always the possibility that the object hierarchy in memory is
different to the XML representation, as such you *may* want it - we're just
trying to be flexible here, and I don't think it's unreasonable to provide.

Or maybe, as Dave mentioned, if the parent class would like to "refine" the
XML generated by the children in some way to suit it's use...

I admit I don't have concrete example that reflects the manifest schema at the
moment, but certainly it could happen...

Darren.


_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Re: [caiman-discuss] Request for review of Data Object Cache design document (0.5)

Reply via email to