On 05/25/10 08:53 AM, Dave Miner wrote:
On 05/25/10 09:36 AM, Darren Kenny wrote:
Hi Dave, more below ...
And yet more in-line from me...
On 05/24/10 09:40 PM, Dave Miner wrote:
On 05/24/10 10:40 AM, Darren Kenny wrote:
On 05/21/10 07:54 PM, Dave Miner wrote:
On 05/19/10 10:34 AM, Darren Kenny wrote:
Hi,
We would like to ask people to please review the Data Object
Cache design
document, which can be found at:
http://hub.opensolaris.org/bin/download/Project+caiman/DataObjectCache/DataObjectCache%2DDesign%2D0.5.pdf
Overall, a very well-written document. But, of course, I have
comments :-)
Thanks, and I'm not surprised there are comments...
2.1, third bullet: s/system/application/
Sure, I just used the term system as referring to the complete
mechanisms, but
Application probably makes more sense.
2.2, last bullet: Would it more correct to say that it's the job
of the
application to define the object hierarchy, and hence the data
tree, and
that DOC is really just providing the infrastructure for the
application
to do that? As written the DOC seems to be perhaps more
omniscient than
you mean for it.
Well, not really, mainly since it was thought that the DOC could
make some use
of the DTD / Schema (at a high level at least) to correctly
structure the XML
generated from the DOC. At least this is something we discussed
with Sarah.
The Application doesn't really know as much about the XML any more,
but
instead this is being deferred to each object in the DOC to know
about their
own area - e.g. Targets know how to map Targets from/to XML, etc.
By the DOC making use of the Schema knowledge it is able to put the
XML
generated by various elements of the DOC into the correct places in
the
Manifest... At least that's the theory...
So how is the DOC notified of the specific schema it is supposed to
use?
I didn't see a mechanism for that.
Well, it was just going to be implemented to use the latest
schema/DTD at the
time - do you think we would need to consider supporting older
schemas? I
think that this would be better serviced by using XSLT outside of it.
I would think that some of the compatibility scenarios might require
using an older schema. This seems simple to allow for, at any rate.
This is largely being facilitated by the move towards DC and AI
sharing the
same Schema, but if there are differences we are considering
passing flags to
the to_xml(), and maybe from_xml() (but I think this should be less
necessary)
to allow for differences e.g.:
DataObject.to_xml( manifest_type = MANIFEST_TYPE_AI )
I think that sort of approach has potential of pushing detailed
application-specific knowledge into leaf objects. That seems
problematic to me.
Hmm, maybe it could have some effect, but it still seems to me to be the
correct place to have the decision made since each object would best
know
where it fits in the over all scheme of things.
The only other alternative I can think of is to allow everything to
generate
XML as if it's going into the overall-schema, and then the DOC later
run an
XSLT or simlar over the generated XML tree, to produce an AI or DC
varient,
removing nodes or modifying them as appropriate...
I would think that it would be more correct for the objects to be
application-agnostic in general, with specific applications
implementing subclasses if needed. Why wouldn't that be the preferred
solution?
3.2 (sub-bullet of bullet 4) regarding the pickling requirements, is
there a specific reference or more concrete example that you could
provide to help ensure our object implementors get this right?
The document at:
http://docs.python.org/library/pickle.html#what-can-be-pickled-and-unpickled
explains what isn't out-of-the-box able to be pickled, but it
essentially
boils down to:
The following types can be pickled:
- None, True, and False
- integers, long integers, floating point numbers,
complex numbers
- normal and Unicode strings
- tuples, lists, sets, and dictionaries containing only
picklable
objects
- functions defined at the top level of a module
- built-in functions defined at the top level of a module
- classes that are defined at the top level of a module
- instances of such classes whose __dict__ or
__setstate__() is
picklable.
So essentially if you stick to normal Python types you should be
fairly safe.
Please put the reference and clarification in your next revision.
Will do...
3.2 bullet 5: Are there things we can do to help object implementers
meet the schema consistency and re-creation constraints?
The hope is that the person implementing a given object, will be
the most
familiar with what the schema dictates for it's own XML equivalent,
so it
shouldn't be that much of an issue if they focus only on their
specific area.
Of course that doesn't mean there isn't something we could do to
help, but I'm
honestly not sure what we could provide other than a possible
breakdown of the
Schema for each area / checkpoint.
I'm concerned that we're going to have developers stumbling around here
trying to figure out how to get it right, just hoping we had a set of
basic practices that would limit that.
Do you have any suggestions on how we might provide such things? I
need to
think a little more about it before I can come up with something.
Certainly we could document some examples, taking a snippet of the
schema, and
showing how to generate it in XML from the DOC, would that suffice?
It would certainly be a start, at least.
3.3 Is there a particular reason we need to constrain these to
Consolidation Private? (In reality, since install is not really
intended to be a separate consolidation, I'd prefer we avoided
consolidation and went with either Project Private or one of the
public
levels). Are you intending a later update (once you've gone further
into implementation) with an imported interface table?
There isn't really any good reason to constrain it to consolidation
private, I
wasn't really sure what was best for something like this.
Project Private would seem TOO constrained, in that it's been
mentioned that
other it might be something people outside of the install team
might want to
utilize for adding another checkpoint (maybe?).
That seems to be an application-specific issue for DC, and the
interface
there is a manifest, not the API's here. Overall, I believe that at
the
moment these are best regarded as private. We can open them further as
we get experience and understand what should be stable.
OK.
Public would seem TOO open, possibly restricting things going forard.
Maybe Uncommitted would be a better middle ground to require
contracts for
anyone wishing to use the interface outside of the project.
I would be fearful that having it Committed would be promising too
much at
this point in time, but I would hope it would get to that point
eventually
after a couple of iterations.
I totally open to peoples preferences here - but PSARC approved
i/fs would
seem to prefer the Committed option to Uncommitted.
As for imported interfaces, it wasn't in the original document
template we
had, so really didn't cross our minds - but we should be able to
add it, for
what we know so far, but until implementation nothing really is set
in stone.
That seems no different than anything else here :-)
True...
3.4.1.1 Is there a reason not to require that names within a
class be
unique? Not being able to depend on this seems to make some of the
other interfaces where retrieval/deletion can use names less useful.
One reason that we didn't restrict this is in the case of Targets,
where you
may have something like:
Targets
TargetDisk [c0d0]
Partition [p0]
Partition [p1]
Slice [s0]
Slice [s1]
...
Slice [s7]
TargetDisk [c2d0]
Partition [p0]
Slice [s0]
Slice [s1]
...
Slice [s7]
As you can see the names in this case (partitions/slices) wouldn't
be unique
in themselves, but would only be considered unique if you include
the context
of the parents, i.e. c2d0/p0/s1.
I'm somewhat doubtful of that suggested taxonomy. A slice (or
partition) seems dependent on its parent device/partition, so I would
expect the names to be fully-qualified.
I don't believe that would be the case at the moment in the schema
design:
...
<target_device>
<type>
<ctd>
<name>c1t0d0</name>
<slice>
<name>0</name>
<action>
<create>
<size>1000</size>
</create>
</action>
</slice>
</ctd>
</type>
</target_device>
...
Well, I don't think we really have a final schema, but I would
certainly be looking for opportunities to make the notation more
concise; using a fully-specified slice device directly might do that.
Remember that one of the objections to usability of XML is its
perceived verbosity; I'd like to not exacerbate that unnecessarily.
I have modified the schema from the original proposal, which I agree was
too verbose. Here is a snippet of data from this new target schema:
<target>
<target_device is_root="true">
<type>
<zpool name="sarahs_pool" action="create">
<vdev>
<mirror>
<disk>
<ctd name="c1t0d0"></ctd>
</disk>
<disk>
<ctd name ="c1t1d0"></ctd>
</disk>
</mirror>
</vdev>
<vdev>
<raidz>
<slice name="c1t2d0s0"></slice>
<slice name="c1t3d0s0"></slice>
As you can see we do have the fully qualified names for disks and
slices. The groupings you see, such as vdev->mirror or vdev->raidz are
there to provide the correct encapsulation of the definition for, in
this case, a zpool, that can have multiple vdevs defined and possibly
different types.
You can specify a slice name without its parent in this schema.
This is an attempt to provide a flatter manifest. However, if you want
to not provide the fully qualified name, you can do that as well, for
example:
<disk>
<ctd name="c1t1d0">
<slice name="0">
If a user wants to do this. It isn't required but allowed.
So, I think that we have to allow for both of these naming schemes and
provide the ability to get the parent of a child to get the childs full
name.
thanks,
sarah
I think that the fully qualified name is certainly fetch-able (e.g.
calling
slice.get_device_name() ) but I don't think it should be necessary
for a child
node to qualify it's name in itself, as in:
Targets
TargetDisk [c0d0]
Partition [c0d0p0]
Partition [c0d0p1]
Slice [c0d0s0]
Slice [c0d0s1]
...
Slice [c0d0s7]
TargetDisk [c2d0]
Partition [c2d0p0]
Slice [c2d0s0]
Slice [c2d0s1]
...
Slice [c2d0s7]
this seems like redundant information being repeated unnecessarily,
when it's
possible to get it using full_name = name + parent.name ...
I suppose we could ask that the name would be unique in the any
given child
list - but I don't think we could ask for it to be the case in the
complete
tree of objects. This could also open up the ability to refer to
children
using a dictionary, which might be useful...
That seems like a theoretical use, but it seems to compromise an
immediate, practical use of the Name to allow it, soI guess I'm
skeptical still.
And entitled to be ;)
3.4.1.3 to_xml(). I like the potential choice to have a parent
generate
a tree for its children, but I'm not sure how a general child class
would know to return None if it were implemented to normally
provide its
own representation; especially if the parent would like to use the
child's to_xml() to assist in its aggregation. Should it perhaps
be the
case that to_xml() also returns a boolean that indicates whether
descent
within this object's subtree should continue? Should this also
apply to
can_handle()/from_xml() so that the behavior can be fully symmetric?
This is certainly possible to do. I'm honestly still delving into
this area in
more depth to see what the best solution would be.
But my thinking on it is that if it's likely that the parent object
would do
the XML generation to include it's children, then most probably the
case that
the child wouldn't ever generate XML in itself.
I think that assumes a very static object hierarchy, and that's not an
assumption I'm all that comfortable with at this point. I'm also
imagining parent objects that might wish to reprocess the children's
xml
for readability or something, but admittedly I don't have a very good
case there to suggest right now.
Understood, but I think that until there is a specific case I don't
know if I
can really plan for it.
Do you really think that the hierarchy isn't going to be that static,
fine
during the development cycle I can see it being very dynamic, but in
the end I
would think a lot of it will be minimal, and if not, quite localized.
We originally made the to_xml() work as it does here (and I've said
this in
another e-mail to Kieth too) to avoid the requirement that every
implementor
of to_xml() wouldn't have to always include a foreach child statement by
default, as a convenience, but maybe it's too convenient?
I think that it's simple enough to override this using the
"generates_xml_for_children()" mechanism below, if you really want to
have
more control over the descent of the tree - so if you returned True
here, then
you are certainly free to do the descent yourself in a more managed
say, thus
allowing for the reprocessing of the XML from children, if desired.
I think that's close enough.
Of course, there's always some exception - I've just not thought of
one yet...
If we're to allow for such a case, it may be better to have an method
like "generates_xml_for_children()" which returns boolean - I just
don't like
methods that return tuples of values as an interface. So this would
make it
more like:
if not obj.generates_xml_for_children():
for child in obj.children():
...
The default implementation would return False - and this always
traverse the
children.
The multi-return seems pretty Pythonic, which is why I suggested it,
but
either way would work, I guess.
Sure it is, but I just feel it's too generic, and could cause more
programming
errors in the end.
Finally, can you expand on the factors that are important to
consider in
the decision between static and dynamic registration? My assumption
would be to lean strongly towards dynamic for flexibility of this
infrastructure, but I'm guessing there are factors that I'm not
considering.
I'm still thinking about this, but I think the main issue with static
registration is that it means you need access to the source code to
update the
static registration file, which may not always be the possible.
Yup.
I would certainly prefer dynamic myself, the question then is just
how dynamic
we should be.
One case I've been looking at is where on import of a module, it's
__init__.py
would call something like:
DataObjectCache.register_class( MyClass )
for each class that can be used to import XML in a module.
This works quite well (I've source code that tests it), but the
main issue is
that something needs to import the module... But I'm thinking that the
Application, in most cases, will already be doing this, but maybe
there are
cases where it doesn't...
In this case, we would need to consider something like a
"signature" for a
module which says that it's an install module with objects that can
import
XML. This would then require us to traverse the PYTHONPATH to find
such a
signature.
This latter option introduces an time penalty at start up, but this
may be
offset by the flexibility it provides. would require
A signature that I would be thinking of is a special file like:
__install__init__.py
which we could search for, and if it's found we would load that
file and
execute it - it would then contain the register_class() methods...
I've still to look into this in more depth, and was intending on
doing it as
part of the implementation, but maybe I should pick one now...
Seems important for your consumers to know which way to go sooner than
later.
Sure, and I'm working on a section about this...
3.4.2.1 A singleton here seems somewhat controversial to me. Why
isn't
it the application's responsibility to control this? An alternate
formulation that I think accomplishes the goals here is to have the
application provide the Engine with an instance, and the Checkpoint
objects can always get it from their associated Engine. Are there
cases
that this doesn't work for? (I didn't attempt to map this to all the
use cases so I'm not necessarily asserting it will, but it seems the
more natural solution to me so I'm wondering if you considered it).
During the prototype phase we did try something like this, having
the engine
as the central point for getting a reference to the DOC, Logging,
etc. but it
presented more problems where everything had to access the engine
to get a
pointer to an instance, etc. - so instead we came up with each of
these being singletons in their own right, so that they could be
accessed from
anywhere simply by calling XXX.get_instance() - but with the one
caveat that
something (in this case the Application) needs to create the first
instance
specifically - to ensure that ordering, etc is correct.
I guess I don't quite understand the problems it created; if anything I
would expect Engine to be a natural singleton and using it to access
the
elements that are part of its environment seems pretty obvious to me.
My feeling is that it's limiting the DataObjectCache, which seems to be
a more generic component than an installation engine. I can more
easily
imagine an application such as an authoring tool where I might want to
use two different caches and move data from one as a starting point to
another, so that's what I'm a bit stuck on here.
We tried using the Engine to access the Logger at the time, and I
think that
was where we first encountered issues - since the DOC also used the
Logger, so
to get it, we had to access the Engine, but then the Engine used the
DOC, and
as a result there were circular dependencies created on doing the
imports
which didn't appear to be simple to remove.
OK, that's starting to make sense :-)
3.4.2.4 Wouldn't clear() be useful in applications like the
interactive
installers where the user might go back to the first step in the
parade
of screens (might be useful as a variant of Use Case 1)? Also, I
didn't
grok what "dump( indented )" is supposed to mean?
True, clear() could have many uses...
As for dump() it's mainly for use in development or debug
information to
generate a "simple view" of the DOC at a given point in time. So
you would get
something like:
DataObjectCache [root]
Targets [DISCOVERED]
Disk [c0t0d0]
Partition [p0]
....
and it uses the str() function to generate this, so an object may
better
represent it's output if it wishes.
I didn't understand specifically what the "indent" argument was
meant to
do, though. No indication was given what the developer would do
with it.
Agreed, we could probably remove the indent argument.
page 25, first paragraph regarding writing out checkpoints to the
manifest: Seems like we need a mechanism for the application to
inform
each checkpoint of whether it's to be written out or not. Not sure
where that falls architecture-wise.
We seed as the checkpoints themselves knowing this (see above) -
and simply
need to return None if they are not going to generate anything.
I'm doubtful of this model yet. Say we're going to have AI dump out a
processed manifest for potential use as input to another AI; why would
any checkpoints be included in that manifest? Many, if not most, of
its
checkpoints are common with another application such as DC where the
checkpoint may well need to be dumped. So I'm again wondering how the
checkpoint knows this itself without baking in knowledge of the
containing app, which I find objectionable.
I can see where you're coming from, and maybe the correct approach is to
always generate the XML for nodes that have an XML equivalent, and
allow the
Application decide if it wants to pluck elements out of the generated
XML - so
DC might leave it in, and AI would remove it - the use of XSLTs make
sense
here...
I don't see how the application would decide what to pluck out after
the fact, so perhaps you can elaborate on how you think that might work?
Dave
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss