[caiman-discuss] XML parsing design meeting

Sarah Jelinek Wed, 20 May 2009 12:40:55 -0600

Hi Jack,

Comments/questions on the strawman inline:



> This is a first strawman proposal for XML reparsing work.  It
> addresses the following parser related issues:
>
> 1) AI's multiple parsers present unneeded complexity and
>    unmaintainability.
>    Things to consider for a single parser:
>    - functionality for data retrieval and search, schema
>      compatibility, how supported / maintainable is the parser
>    - The right parser will fall out as the other problems are solved.
>
> 2) Current AI manifests are not easy to use.
>    - Fragmented, could be better organized
>    - We decided that changing from XML is out of scope and off the
>      table.
>
> 3) AI manifests need to be forward and backward compatible between
>    builds.
>    - Manifests of different versions than the automated installer must
>      work whenever possible.
>    - A given version of the automated installer must be able to
>      recognize a manifest with which it is not compatible and
>      gracefully fail.
>
> 4) Semantic validation is needed for AI.
>    - Lack of it means failures further down the installation process
>      instead of up front, or misconfiguration.
>
> 5) AI manifests have validation holes.
>   - Example: criteria schema doesn't bind a min/max pair of ipv4
>     address patterns to an ipv4 address criterion.
>
>
> Proposal
> ========
>
> This proposal is only high level design details, and does not include
> APIs or contents of manifests.
>
> Items marked with XXX haven't yet been discussed at a meeting, but are
> included here for discussion nonetheless.
>
>
> 1. Parser
> ---------
>
> The ManifestServ / libxml2 parser will be used.
>
> Both are nearly identical in functionality.
> Both are both wrappers around libxml2.
> Plusses and minuses of libxml2 vs lxml follow:
>
> Advantages of ManifestServ:
> - High level API for semantic validation is already implemented.
> - ManifestServ allows for better narrowing of matches based on values
>   of multiple child nodes.
> - Simpler search-path syntax compared to Xpath
> - Easy to make changes / additions because it is our own code.
> - Less code layers
>     consumer / ManifestServ / libxml2
>         vs
>     consumer / wrapper / lxml / libxml2
> - Dynamic default setting is available
>
I do have a question about using the ManifestServ parser. The use of 
this, at least from DC's perspective, is to instantiate a ManifestServ 
object, and use that object to provide data to consumers later, correct? 
This is done via a socket that is created so consumers can query, via 
this socket, for manifest data that is contained in the ManifestServ 
object. I can see that in DC this is valuable and required, in part 
because of the finalizer scripts. They need to get specific manifest 
settings to do their work, in some cases.

AI doesn't currently need to do this. however, if we go to a DC like 
engine then I can see where this might be required. I am just wondering 
if there is overhead in the way ManifestServ is implemented that may not 
required by all consumers. installadm, as it is currently implemented, 
may be better served by the ManifestServ model.

Not a question.. more a statement... I can see how using this would 
allow us to create a unified engine that is more extensible.

> Disadvantages of ManifestServ:
> - Maintained by us.
>
> Advantages of lxml:
> - has diff feature for checking the differences between two manifests.
> - Easier to read schema version.
> - Python 3 clean
> - Xpath search has more operators for checking search values.
> - Not maintained by us.
>
> Disadvantages of lxml:
> - Need to write wrapper code to implement semantic validation and to
>   do other more complex tasks.
>
>
> 2. Manifest Organization
> ------------------------
>
> The manifests used for AI will be organized as follows:
>
> Criteria Manifest will contain multiple sets of criteria, and pointers
> to the AI manifest (which will contain installation parameters) and
> the system-configuration (SC) manifest (which will be an SMF enhanced
> profile containing system configuration parameters).  If a system's
> criteria set matches a set in a Criteria Manifest, the AI manifest and
> the SC manifest specified in that criteria manifest will be used.
>
> - Why have separate sets of criteria in one criteria manifest?
>
>   The alternative is to have a separate criteria manifest for
>   different criteria sets which specify the same system
>   definition.  Combining multiple criteria sets into a single file
>   reduces the complexity.
>
> - Why split the manifest files?  Why not combine them into one file?
>
>   Combining all manifests into a single file would be a plus from
>   a file manageability point of view.  However, there are more
>   benefits to splitting the files, including:
>
>     - AI can offload setting up many parameters to another tool
>       used by other parts of the system.
>
>     - System configuration can be done for an installation the
>       same way it will be done for a sysid-config type situation:
>       there will be a GUI tool used to take configuration input
>       and will generate an enhanced SMF profile.  (A single SMF
>       enhanced profile can contain multiple service bundles for
>       instances of multiple services, including system-specific
>       user-specified services for particular systems.)
>
>     - Splitting the AI manifest from the enhanced SMF profile
>       demarcates the system configuration items (which will be)
>       used by other utilities from those which are specific to AI.
>
> - What if a service doesn't start out as an SMF service, but later
>   becomes one?
>
>     That service will have to start out in the AI manifest and
>     then migrate it later to an enhanced SMF profile.  However, it
>     is a requirement on the SMF team to provide the install team
>     with all the services needed for an install, by the time of
>     the next release, so users won't ever see the migration.
>
> - Why have a separate Criteria Manifest?
>
>     - A separate Criteria Manifest can point to the AI manifest
>       and SC manifest as peers.  The latter two manifests together
>       define a system.
Do you mean 'define the installed system'? I think of the AI manifest 
and SC manifests together as defining what the installed system will 
look like.

>
>     - It does not make sense to combine the criteria with either
>       of the two other manifests since the criteria map a system
>       to a full definition of that system.
>
>     - An alternative would be to map different criteria to each
>       of the AI manifest and SC manifest, but that would be
>       difficult for users to keep track of.
>
> - Why have two types of schemas: DTD and RelaxNG?
>
>     - DTDs are the way enhanced SMF profiles will be specified.
>       We're stuck with them for enhanced SMF profiles.
>
>     - It is intended that a GUI tool (not part of the install
>       project) will be used to generate the enhanced SMF profile.
>       The user won't need to even look at the enhanced SMF profile
>       as long as it is separated out from the other
>       parts the user will directly manipulate.
>
>     - The rest of the input (that which the user will look at)
>       can be done in a form which is much more readable,
>       checkable and easily understood than DTDs.
>
>     - The files the user will be looking at will be done as
>       RelaxNG as RelaxNG schemas have many benefits over DTDs:
>
>         - They include type and pattern checking of elements
>
>         - They are easier to read and understand.
>           (http://www.mulberrytech.com/papers/whichschema)
>
> - What about other schemas?  Why RelaxNG?
>
>     W3C was the next most popular contender, but RelaxNG was
>     developed to overcome some of the shortcomings of W3C.  For
>     example, W3C schemas are inconsistent in their treatment of
>     their various parts and are more limited in their constraints
>     between attributes.  RelaxNG is consistent in the treatment of
>     its parts, and so is more easily understood.  (See
>     http://www.webreference.com/xml/column59/ and
>     http://www.imc.org/ietf-xml-use/mail-archive/msg00217.html for
>     examples.)
>
> (XXX proposed solution hasn't yet been discussed in a meeting.)
> I suggest changing the name of the criteria manifest to the Map
> Manifest as it maps systems to AI manifest and SC manifest based on
> the criteria specified.
>
> - Why change the name?
>     Criteria Manifest is not descriptive enough of what it is for.
>     Can come up with a better name.
>
>
> 3. Compatibility / Version checking
> -----------------------------------
>
> How AI manifests will be version checked:
>
> Schemas will be assigned version numbers when written.  They will be
> revved in ascending order.  RelaxNG schemas will use an annotation to
> record the version.
What are the complexities of using annotation as the versioning mechanism?

>
> Manifests will have their version in a field (likely an attribute
> at the top of the tree.)
>
> The installadm utility will assign to manifests the version number
> of the schema or DTD they validate against.
>
At this point we validate manifests against the schema in the boot 
image. So, if we have installadm assign the version, should we consider 
allowing users to specify the location of the schema they want to use 
for validation? As opposed to only validating against the image schema, 
or the system schema.

> - Why have version recorded by installadm?
>
>     This gives the version number meaning.  The version
>     number says that the manifest validates against the schema
>     for that installer version.
>
> - How will the schema version be read?
>
>     Either via a parser (preferred) or a simple grep through the
>     file.
>
> The client will check the manifest version number against the schema
> and will print a warning if there is a mismatch.  The warning will
> print the versions of all files, and will say that the installation
> will continue but may not work.
>

This begs a question... or a statement. One of the things Sue is working 
on with regard to services re-design, is that perhaps we want to 
separate the boot image from the definition of the service. So, if we do 
what you are suggesting above, and we implement full dns service 
discovery, it is possible that service that responded to the client is 
different from the image that the client is booted from, we have a mismatch.

Your design decision above constrains, in my opinion, the options for 
re-design of an install service. This is something you should coordinate 
with Sue on and understand the full implications of this decision.


> - Why only a warning?
>
>     We want the manifest to validate if possible.  The validator
>     will catch errors if there are errors, and the version warning
>     will provide the user a reason why those errors occurred.
>

It would be good if you could outline the possible scenarios, that is 
those that will cause validator errors and those that might not and the 
implications of allowing the install to proceed in the event of a 
mismatch, which wasn't a validation error. Not sure if there are any of 
those, but want to be sure.

> - Caveat:
>     Need to change the field name if the semantic meaning of that
>     field changes, but the way it appears in the manifest appears
>     the same.  Else there could be unexpected results because the
>     user expects the old behavior when the behavior has changed.
>
> (XXX no proposed solution, not discussed yet)
> Having a version implanted by installadm could lead to copies of the
> same manifest with multiple version numbers.  Do we want a second
> version number based on the manifest format itself, to check for the
> sameness of manifests?
>
> (XXX no proposed solution, not discussed yet)
> Is there a need / how to check DTD versions?
>
> (XXX no proposed solution, not discussed yet)
> Other things to consider for version compatibility:
>
> - when adding items which don't conflict with old items, make added
>   items optional. do a minor rev.
>
> - Only when we make an incompatible change would we completely change
>   the schema and make a major rev.
>
> - Can ship multiple versions of the schema for backward compatibility.
>
>   - This will buy the closest thing to "it just works" compatibility
>     though.
>
>   - Will need to hardwire specific functionality for different
>     versions in the AI code proper though;  may be a maintainability
>     nightmare.
>
>
> 4. Semantic Validation
> ----------------------
>
> What is being called "semantic validation" is really just validation
> which is more thorough than the syntactic validation offered by
> schemas.  Semantic validation is a subset of this;  anything can be
> checked, not just semantics.  The intent is to use for checking values
> against other values, or against system context / environment.
>
> - Thorough semantic validation will be done on the client side when
>   the installation starts.
>
> - A subset of this validation will be done on the server side, limited
>   only by the fact that the server cannot provide the same environment
>   as provided by the client.  (For example, the server can validate
>   the format of a disk name, but the client can validate that a disk
>   of that name exists on the system.)
>
> - The validation methods called will be the same on both the server
>   and client, but an environment variable will select whether those
>   methods will do server or client mode checking.
>
>
> 5. Manifest with validation holes
> ---------------------------------
>
> (XXX proposed solution hasn't yet been discussed in a meeting.)
> I suggest making better use of the schema and providing a clearer
> syntax by replacing the format of a criterion and putting the "min",
> "max" or "value" values explicitly as in the following:
>
>         <criteria_set>
>                 <IPv4 min=1.2.3.4 max=5.6.7.8>
>                 <MAC value=2.3.4.5>
>                 <MEM min=1Gb>    <!-- unbounded max -->
>         </criteria_set>
>
> -Why change from current implementation?
>
>     - More compact syntax.
>     - More robust checking vs schema.
>     - No need for "unbounded";  just leave off min or max
>       attribute.
>
> - Other options?
>     A criterion needs to contain a name and its values.  The above
>     does this with great efficiency.
>
> (XXX proposed solution hasn't yet been discussed in a meeting.)
> How to do validation of manifests with the use of derived profiles:
> Derived profiles implementation have not been decided yet.  If they
> are implemented on the server side and there is something which can be
> validated on the server, they will be.  Full validation of them will
> be done on the client side, as full client context is provided there.

That's it for now. Good work!

thanks,
sarah

[caiman-discuss] XML parsing design meeting

Reply via email to