Re: [caiman-discuss] Request for review of Data Object Cache design document (0.5)

Sarah Jelinek Wed, 26 May 2010 10:07:17 -0700


On 05/26/10 06:54 AM, Darren Kenny wrote:

Hi Sarah,

(I've cut some of the text to make things more readable)

On 05/25/10 03:50 PM, Sarah Jelinek wrote:

On 05/25/10 07:36 AM, Darren Kenny wrote:

2.2, last bullet: Would it more correct to say that it's the job of the
application to define the object hierarchy, and hence the data tree, and
that DOC is really just providing the infrastructure for the application
to do that?  As written the DOC seems to be perhaps more omniscient than
you mean for it.

Well, not really, mainly since it was thought that the DOC could make some use
of the DTD / Schema (at a high level at least) to correctly structure the XML
generated from the DOC. At least this is something we discussed with Sarah.

The Application doesn't really know as much about the XML any more, but
instead this is being deferred to each object in the DOC to know about their
own area - e.g. Targets know how to map Targets from/to XML, etc.

By the DOC making use of the Schema knowledge it is able to put the XML
generated by various elements of the DOC into the correct places in the
Manifest... At least that's the theory...

So how is the DOC notified of the specific schema it is supposed to use?
    I didn't see a mechanism for that.

Well, it was just going to be implemented to use the latest schema/DTD at the
time - do you think we would need to consider supporting older schemas? I
think that this would be better serviced by using XSLT outside of it.

Somewhere we have to understand the schema(s) we are operating under so
we can successfully dump an AI manifest. The DOC has to have this data
provided to it. My assumption was that for any run of AI we would have
validated the AI manifest against a schema and this schema would be used
to drive the manifest output in the DOC.

That makes sense - to use the same DTD for output, as was used for input.

I'll add something to the DOC to have this specified, most likely as a
parameter to the "generate_manifest_xml()" method.


Ok, seems reasonable.

This is largely being facilitated by the move towards DC and AI sharing the
same Schema, but if there are differences we are considering passing flags to
the to_xml(), and maybe from_xml() (but I think this should be less necessary)
to allow for differences e.g.:

       DataObject.to_xml( manifest_type = MANIFEST_TYPE_AI )

I think that sort of approach has potential of pushing detailed
application-specific knowledge into leaf objects.  That seems
problematic to me.

Hmm, maybe it could have some effect, but it still seems to me to be the
correct place to have the decision made since each object would best know
where it fits in the over all scheme of things.

I had thought that that the object wouldn't have to know it is
translating itself to a specific manifest. My thought was that any
object could translate itself to xml, and that the manifest comes from
the fact that we have the schema and we know the order in which elements
and attributes must appear. Is this not possible?

I think this is the direction I'm going now - to have the object simply
generate the XML that represents it, and have the DOC decide what to use based
on the DTD.

I like this approach better.

My main concern on this is to what level the DTD is used by the DOC - I had
hoped to only require it for high-level structure, not each and every node...
For example, it would work at the first 2 or 3 levels of XML, anything below
that is not controlled by the DOC.

If it has to control each and every node, then it's becoming a lot more
complex to manage multiple DTDs - and there would need to be significant
knowledge about the mapping between each DTD element and a class instance.

Well.. I would think we could balance this out, have it use the DTD forthe upper levels, and have the parents assembly the xml for itself andtheir children as well.

The only other alternative I can think of is to allow everything to generate
XML as if it's going into the overall-schema, and then the DOC later run an
XSLT or simlar over the generated XML tree, to produce an AI or DC varient,
removing nodes or modifying them as appropriate...

If the above isn't possible then I think that we have to consider
something like this. That is every object translates itself and we use
xslt to transform this data into a valid AI manifest.

I think that this might actually be the most flexible approach - allowing for
different XSLTs to be used depending on the desired output.

Sure, that is also an option. XSLT is powerful and can be used totransform the output into whatever we need. The only issue I see withthis is that we have to keep the XSLT up to date with the schema asthings change.

3.2 bullet 5: Are there things we can do to help object implementers
meet the schema consistency and re-creation constraints?

The hope is that the person implementing a given object, will be the most
familiar with what the schema dictates for it's own XML equivalent, so it
shouldn't be that much of an issue if they focus only on their specific area.

Of course that doesn't mean there isn't something we could do to help, but I'm
honestly not sure what we could provide other than a possible breakdown of the
Schema for each area / checkpoint.

I'm concerned that we're going to have developers stumbling around here
trying to figure out how to get it right, just hoping we had a set of
basic practices that would limit that.

Do you have any suggestions on how we might provide such things? I need to
think a little more about it before I can come up with something.

Certainly we could document some examples, taking a snippet of the schema, and
showing how to generate it in XML from the DOC, would that suffice?

I would think that a developer should be able to take the specific
object and map it to the portion of the schema that it is part of. The
way the schemas are defined, "transfer", "target", "execution",
"configuration" mean that the objects that represent these are contained
in these schemas and from my perspective it should be relatively
straightforward to map the object to xml. The developer can also
validate the xml against the schema, for the specific sub-schema they
are dumping xml. They would have to of course validate it against
multiple objects output, in the correct order, but it seems reasonable
to have them do this since the schemas are going to be modular.

I would think that this would certainly be the general case - where there is
an obvious link between a section of the XML manifest and the implementation
in the DOC - if this isn't the case it could be very confusing to people for
maintain the code.


Yep, I agree.

Providing examples would be helpful, and developers can use the xml
instance document examples I have in the soon to be released schema
design document to guide them as well.

We'll work on examples then.

If an object doesn't provide it's own xml translation, the only other
alternative I can think of is that the DOC knows how to do this, for
each object that is part of a valid manifest. My concern about having
the DOC do this is that when things change in the objects, the DOC has
to track this separately, as opposed to having the developer making the
changes directly in the object itself.

I certainly don't want the DOC having to know this kind of information -
that's the whole point of having each object handle it's own section of XML -
the changes would all be localized, and as such much less likely to get out of
sync.

I'm somewhat doubtful of that suggested taxonomy.  A slice (or
partition) seems dependent on its parent device/partition, so I would
expect the names to be fully-qualified.

I don't believe that would be the case at the moment in the schema design:

      ...
          <target_device>
              <type>
                 <ctd>
                   <name>c1t0d0</name>
                   <slice>
                     <name>0</name>
                      <action>
                      <create>
                        <size>1000</size>
                      </create>
                      </action>
                   </slice>
                 </ctd>
              </type>
           </target_device>
      ...

I think that the fully qualified name is certainly fetch-able (e.g. calling
slice.get_device_name() ) but I don't think it should be necessary for a child
node to qualify it's name in itself, as in:

       Targets
           TargetDisk  [c0d0]
               Partition [c0d0p0]
               Partition [c0d0p1]
                   Slice [c0d0s0]
                   Slice [c0d0s1]
                   ...
                   Slice [c0d0s7]
           TargetDisk  [c2d0]
               Partition [c2d0p0]
                   Slice [c2d0s0]
                   Slice [c2d0s1]
                   ...
                   Slice [c2d0s7]

this seems like redundant information being repeated unnecessarily, when it's
possible to get it using full_name = name + parent.name ...

The current schemas do not not have children that have fully qualified
names. The plan moving forward was to keep this the same, and associate
a child with the parent, and get the full name that way. I agree with
Darren that storing the info in the child seems redundant.

I do have a question though.. the delete_child() method indicates that
this will remove a specific parent class. Is this method on the parent
class so that you know which parent to use?

I'm not 100% sure what the question is here, but I'm thinking you're referring
to the text on page 9, for delete_child(), if not please correct me...

When I say that "delete_child() will remove a specific child from the parent
class where the instances match" what I'm saying is that it will delete the
child from the list of children contained in the object that its called on,
e.g.:

     def delete_child( self, child_to_delete ):
         ...
         for child in self.children:
             if ( child == child_to_delete ):
                 # Remove child from self.children

     ...

     my_object.delete_child( old_object )


So in this case my_object is the parent of old_object, and the child of
my_object that matches the instance of "child_to_delete" will be removed -
this is what's compared by default if you compare using "==".


Ah, ok, that makes sense.

I suppose we could ask that the name would be unique in the any given child
list - but I don't think we could ask for it to be the case in the complete
tree of objects. This could also open up the ability to refer to children
using a dictionary, which might be useful...

That seems like a theoretical use, but it seems to compromise an
immediate, practical use of the Name to allow it, soI guess I'm
skeptical still.

And entitled to be ;)

3.4.1.3 to_xml().  I like the potential choice to have a parent generate
a tree for its children, but I'm not sure how a general child class
would know to return None if it were implemented to normally provide its
own representation; especially if the parent would like to use the
child's to_xml() to assist in its aggregation.  Should it perhaps be the
case that to_xml() also returns a boolean that indicates whether descent
within this object's subtree should continue?  Should this also apply to
can_handle()/from_xml() so that the behavior can be fully symmetric?

This is certainly possible to do. I'm honestly still delving into this area in
more depth to see what the best solution would be.

But my thinking on it is that if it's likely that the parent object would do
the XML generation to include it's children, then most probably the case that
the child wouldn't ever generate XML in itself.

I would think that if a parent is generating the xml for itself and its
children it would still rely on the child to provide its xml
representation. To aggregate the data into a tree. Is this not possible
or desirable for some reason? Why would we want the parent to generate
the xml for itself and its children without traversing the children?

Certainly the preference is that the children would generate their own XML,
but there is always the possibility that the object hierarchy in memory is
different to the XML representation, as such you *may* want it - we're just
trying to be flexible here, and I don't think it's unreasonable to provide.

Or maybe, as Dave mentioned, if the parent class would like to "refine" the
XML generated by the children in some way to suit it's use...

I admit I don't have concrete example that reflects the manifest schema at the
moment, but certainly it could happen...


Ok, fair enough.

thanks,
sarah
****

Darren.

_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Re: [caiman-discuss] Request for review of Data Object Cache design document (0.5)

Reply via email to