Re: [caiman-discuss] Request for review of Data Object Cache design document (0.5)

Darren Kenny Wed, 26 May 2010 06:36:36 -0700

On 05/25/10 03:53 PM, Dave Miner wrote:
> On 05/25/10 09:36 AM, Darren Kenny wrote:
>> On 05/24/10 09:40 PM, Dave Miner wrote:
>>> On 05/24/10 10:40 AM, Darren Kenny wrote:
>>>> On 05/21/10 07:54 PM, Dave Miner wrote:
>>>>> On 05/19/10 10:34 AM, Darren Kenny wrote:
>>>
>>> So how is the DOC notified of the specific schema it is supposed to use?
>>>    I didn't see a mechanism for that.
>>
>> Well, it was just going to be implemented to use the latest schema/DTD at the
>> time - do you think we would need to consider supporting older schemas? I
>> think that this would be better serviced by using XSLT outside of it.
>>
>
> I would think that some of the compatibility scenarios might require
> using an older schema.  This seems simple to allow for, at any rate.
>


I agree, I'll add a mechanism to allow for specification of the schema to use.

>>>
>>>> This is largely being facilitated by the move towards DC and AI sharing the
>>>> same Schema, but if there are differences we are considering passing flags 
>>>> to
>>>> the to_xml(), and maybe from_xml() (but I think this should be less 
>>>> necessary)
>>>> to allow for differences e.g.:
>>>>
>>>>       DataObject.to_xml( manifest_type = MANIFEST_TYPE_AI )
>>>>
>>>
>>> I think that sort of approach has potential of pushing detailed
>>> application-specific knowledge into leaf objects.  That seems
>>> problematic to me.
>>
>> Hmm, maybe it could have some effect, but it still seems to me to be the
>> correct place to have the decision made since each object would best know
>> where it fits in the over all scheme of things.
>>
>> The only other alternative I can think of is to allow everything to generate
>> XML as if it's going into the overall-schema, and then the DOC later run an
>> XSLT or simlar over the generated XML tree, to produce an AI or DC varient,
>> removing nodes or modifying them as appropriate...
>
> I would think that it would be more correct for the objects to be
> application-agnostic in general, with specific applications implementing
> subclasses if needed.  Why wouldn't that be the preferred solution?
>

Hmm, I suppose that it's not impossible to do things that way - the main
concern that I would have is that you would have to provide a factory
mechanism which would make the decision on which object is created when
creating the tree, and this would be varied depending on the application.

In other words, people would have to do something like :

    if is_DC:
        return MyObjectDcVariant(...)
    elif is_AI:
        return MyObjectAiVariant(...)

any time they want to vary the object created depending on the application
being run.

In the end I think that this would be more complex than an XSLT solution or
passing flags to the to_xml() implementation.

>
>>>
>>>>>
>>>>> 3.2 (sub-bullet of bullet 4) regarding the pickling requirements, is
>>>>> there a specific reference or more concrete example that you could
>>>>> provide to help ensure our object implementors get this right?
>>>>
>>>> The document at:
>>>>
>>>>
http://docs.python.org/library/pickle.html#what-can-be-pickled-and-unpickled
>>>>
>>>> explains what isn't out-of-the-box able to be pickled, but it essentially
>>>> boils down to:
>>>>
>>>>       The following types can be pickled:
>>>>
>>>>           - None, True, and False
>>>>           - integers, long integers, floating point numbers, complex 
>>>> numbers
>>>>           - normal and Unicode strings
>>>>           - tuples, lists, sets, and dictionaries containing only picklable
>> objects
>>>>           - functions defined at the top level of a module
>>>>           - built-in functions defined at the top level of a module
>>>>           - classes that are defined at the top level of a module
>>>>           - instances of such classes whose __dict__ or __setstate__() is
>>>>             picklable.
>>>>
>>>> So essentially if you stick to normal Python types you should be fairly 
>>>> safe.
>>>>
>>>
>>> Please put the reference and clarification in your next revision.
>>>
>>
>> Will do...
>>
>>>>>
>>>>> 3.2 bullet 5: Are there things we can do to help object implementers
>>>>> meet the schema consistency and re-creation constraints?
>>>>
>>>> The hope is that the person implementing a given object, will be the most
>>>> familiar with what the schema dictates for it's own XML equivalent, so it
>>>> shouldn't be that much of an issue if they focus only on their specific 
>>>> area.
>>>>
>>>> Of course that doesn't mean there isn't something we could do to help, but 
>>>> I'm
>>>> honestly not sure what we could provide other than a possible breakdown of 
>>>> the
>>>> Schema for each area / checkpoint.
>>>>
>>>
>>> I'm concerned that we're going to have developers stumbling around here
>>> trying to figure out how to get it right, just hoping we had a set of
>>> basic practices that would limit that.
>>
>> Do you have any suggestions on how we might provide such things? I need to
>> think a little more about it before I can come up with something.
>>
>> Certainly we could document some examples, taking a snippet of the schema, 
>> and
>> showing how to generate it in XML from the DOC, would that suffice?
>
> It would certainly be a start, at least.
>

OK.

>>>> One reason that we didn't restrict this is in the case of Targets, where 
>>>> you
>>>> may have something like:
>>>>
>>>>       Targets
>>>>           TargetDisk  [c0d0]
>>>>               Partition [p0]
>>>>               Partition [p1]
>>>>                   Slice [s0]
>>>>                   Slice [s1]
>>>>                   ...
>>>>                   Slice [s7]
>>>>           TargetDisk  [c2d0]
>>>>               Partition [p0]
>>>>                   Slice [s0]
>>>>                   Slice [s1]
>>>>                   ...
>>>>                   Slice [s7]
>>>>
>>>> As you can see the names in this case (partitions/slices) wouldn't be 
>>>> unique
>>>> in themselves, but would only be considered unique if you include the 
>>>> context
>>>> of the parents, i.e. c2d0/p0/s1.
>>>>
>>>
>>> I'm somewhat doubtful of that suggested taxonomy.  A slice (or
>>> partition) seems dependent on its parent device/partition, so I would
>>> expect the names to be fully-qualified.
>>
>> I don't believe that would be the case at the moment in the schema design:
>>
>>      ...
>>          <target_device>
>>              <type>
>>                 <ctd>
>>                   <name>c1t0d0</name>
>>                   <slice>
>>                     <name>0</name>
>>                      <action>
>>                      <create>
>>                        <size>1000</size>
>>                      </create>
>>                      </action>
>>                   </slice>
>>                 </ctd>
>>              </type>
>>           </target_device>
>>      ...
>>
>
> Well, I don't think we really have a final schema, but I would certainly
> be looking for opportunities to make the notation more concise; using a
> fully-specified slice device directly might do that.  Remember that one
> of the objections to usability of XML is its perceived verbosity; I'd
> like to not exacerbate that unnecessarily.

I won't comment here, since I think there is enough of a thread on it...

>>
>> Understood, but I think that until there is a specific case I don't know if I
>> can really plan for it.
>>
>> Do you really think that the hierarchy isn't going to be that static, fine
>> during the development cycle I can see it being very dynamic, but in the end 
>> I
>> would think a lot of it will be minimal, and if not, quite localized.
>>
>> We originally made the to_xml() work as it does here (and I've said this in
>> another e-mail to Kieth too) to avoid the requirement that every implementor
>> of to_xml() wouldn't have to always include a foreach child statement by
>> default, as a convenience, but maybe it's too convenient?
>>
>> I think that it's simple enough to override this using the
>> "generates_xml_for_children()" mechanism below, if you really want to have
>> more control over the descent of the tree - so if you returned True here, 
>> then
>> you are certainly free to do the descent yourself in a more managed say, thus
>> allowing for the reprocessing of the XML from children, if desired.
>>
>
> I think that's close enough.
>

OK, will do it that way.

>>
>> We tried using the Engine to access the Logger at the time, and I think that
>> was where we first encountered issues - since the DOC also used the Logger, 
>> so
>> to get it, we had to access the Engine, but then the Engine used the DOC, and
>> as a result there were circular dependencies created on doing the imports
>> which didn't appear to be simple to remove.
>
> OK, that's starting to make sense :-)
>

Phew :-)

>>> I'm doubtful of this model yet.  Say we're going to have AI dump out a
>>> processed manifest for potential use as input to another AI; why would
>>> any checkpoints be included in that manifest?  Many, if not most, of its
>>> checkpoints are common with another application such as DC where the
>>> checkpoint may well need to be dumped.  So I'm again wondering how the
>>> checkpoint knows this itself without baking in knowledge of the
>>> containing app, which I find objectionable.
>>
>> I can see where you're coming from, and maybe the correct approach is to
>> always generate the XML for nodes that have an XML equivalent, and allow the
>> Application decide if it wants to pluck elements out of the generated XML - 
>> so
>> DC might leave it in, and AI would remove it - the use of XSLTs make sense
>> here...
>>
>
> I don't see how the application would decide what to pluck out after the
> fact, so perhaps you can elaborate on how you think that might work?
>

For example the XSLT could pass-through all checkpoints except "internal" ones
if writing to a DC manifest, but for AI, different XSLT could remove all
references to checkpoints:


    <checkpoints>
        <td name="target_discovery"/>
        <ti name="target_instantiation"/>
        <transfer_ips name="internal_transfer_ips_1" .../>
        <cpio name="internal_transfer_1" .../>
        <cpio name="internal_transfer_2" .../>
        <cpio name="user_transfer of files" .../>
    </checkpoints>

and a piece of XSLT applied to this could easily remove the elements based on
name being "internal_*" or tag name being td or ti, etc.

It all depends on what is desired - but in reality I would expect most of
these checkpoints to not generate any XML at all, except where they know
themselves if they are user-provided (in a manifest to start with).

It is just an example, of what could be "plucked out"...

But essentially the idea is that all objects would generate possibly valid XML
to describe them, but XSLTs - for each application type we support - could
post-process them to tweak as appropriate for their needs.

Actually, thinking about this more, this could be a better way for the DOC to
be able to manage the generation for the XML - through the use of XSLTs,
rather than itself parsing the DTD, and using information extracted from it -
since XSLTs can be quite powerful to manipulate and restructure XML documents.

It would also make it more future-proof, since it's likely to require less
code changes.

Thanks,

Darren.


_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Re: [caiman-discuss] Request for review of Data Object Cache design document (0.5)

Reply via email to