[caiman-discuss] XML parsing design meeting

Karen Tung Thu, 21 May 2009 12:18:07 -0700

Hi Jack,

I have some comments about the revised strawman:


 >
 > Proposal
 > ========
 >
 > This proposal is only high level design details, and does not include
 > APIs or contents of manifests.
 >
 >
 > 1. Parser
 > ---------
 >
 > The decision of which parser to use (lxml or ManifestServ) is lxml
 > mainly because of the following reasons:
 >
 > A) Most people on the team thought that using a library maintained
 > outside of OpenSolaris will end up requiring less on-going effort for
 > maintenance once the up-front effort of writing the code is completed.
 >
 > B) lxml interfaces are more standard and thus better understood and
 > documented (on the web, for example).
 >
 > Below are the plusses and minuses of each:
 >
 > Both are nearly identical in functionality.
 > Both are both wrappers around libxml2.
 > Plusses and minuses of libxml2 vs lxml follow:
 >
 > Advantages of ManifestServ:
 > - High level API for semantic validation is already implemented
 >   and reliable.
 > - ManifestServ allows for better narrowing of matches based on values
 >   of multiple child nodes.
 > - Simpler search-path syntax compared to Xpath
 > - Easy to make changes / additions because it is our own code.
 > - Less code layers
 >     consumer / ManifestServ / libxml2
 >         vs
 >     consumer / wrapper / lxml / libxml2
 > - Dynamic default setting is available
 >
 > Disadvantages of ManifestServ:
 > - Maintained by us.
 >
 > Advantages of lxml:
 > - has diff feature for checking the differences between two manifests.
 > - Easier to read schema version.
 > - Python 3 clean
 > - Xpath search has more operators for checking search values.
 > - Not maintained by us.
 >
 > Disadvantages of lxml:
 > - Need to write wrapper code to implement semantic validation and to
 >   do other more complex tasks.
 >

My comment for this section makes the assumption that we only want to
have one parser for all of the different Caiman projects,
not just one parser for AI.

While considering to select lxml as the parser, has there been any analysis
done to see how will this impact DC?  Will everything in DC today still 
work?
Of course, I understand that we will certainly re-write the functionality
ManifestServ is providing, but is there anything that ManifestServ that
is currently providing that can't be done?

Furthermore, will lxml be able to meet the need for any known future
projects?

 >
 > 3. Compatibility / Version checking
 > -----------------------------------
 >
 > How AI manifests will be version checked:
 >
 > Manifests will have a single version in a field (likely an attribute
 > at the top of the tree.) The installadm utility will assign to
 > manifests the version number of the schema they validate against.
 >
 > Each element and attribute in the schema would be assigned a version
 > number.  The manifest would be assigned a single version number.
 >
 > Version checking would verify that the manifest version was equal to
 > the highest version number in the schema.
 >
 > - Why have version recorded by installadm?
 >
 >     This gives the version number meaning.  The version
 >     number says that the manifest validates against the schema
 >     for that installer version.
 >
 >
 > The client will check the manifest version number against the schema
 > and will print an error message and stop if there is a mismatch.  The
 > message will show which fields in the schema had versions which were
 > higher than the manifest version.
 >
 > - Why a (fatal) error and not a warning?
 >
 >     Predictability and reliability.  If only a warning were
 >     printed and the install proceeded, the install may or may not
 >     work.  By failing instead, the installer is more consistent.
 >     If versioning passes, one can be pretty sure that the install
 >     will not fail due to a version mismatch.
 >
 > Some benefits of this way of doing versioning:
 >
 > 1) There is no issue around a semantic change to an element or
 > attribute.  If such a change is made, the version number of that
 > element or attribute is revved and will cause a validation error if
 > the manifest isn't similarly revved.
 >
 > 2) The installer can be enhanced to look at versions of fields to
 > tell what to do with them.
 >
 > 3) This versioning allows for error messages which state exactly
 > which fields caused the validation errors.  This fills a need
 > to tell the user what the changes are between versions.
 >
 > There are still other problems though:
 >
 > 1) A manifest may not validate because it is down-revved, even though
 > the only change to go to the next revision is the addition of an
 > optional field, which the manifest does not use.  The manifest should
 > validate, but it won't.
 >
 >
 > (XXX some discussion, leaning against implementing, no decision yet.)
 > Having a version implanted by installadm could lead to copies of the
 > same manifest with multiple version numbers.  Do we want a second
 > version number based on the manifest format itself, to check for the
 > sameness of manifests?
 >
 >
 > (XXX no proposed solution, not discussed yet)
 > Is there a need / how to check DTD versions?
 >
 > (XXX no proposed solution, not discussed yet)
 > Other things to consider for version compatibility:
 >
 > - when adding items which don't conflict with old items, make added
 >   items optional. do a minor rev.
 >
 > - Only when we make an incompatible change would we completely change
 >   the schema and make a major rev.
 >
 > - Can ship multiple versions of the schema for backward compatibility.
 >
 >   - This will buy the closest thing to "it just works" compatibility
 >     though.
 >
 >   - Will need to hardwire specific functionality for different
 >     versions in the AI code proper though;  may be a maintainability
 >     nightmare.
 >

I know this problem statement is about AI manifest versioning.
Now that I think about it more, I think versioning of schema and manifest
is a general problem that will apply to all caiman components that
currently use or will be using manifests.  So, I think it would
be good to design the solution for that general problem.  Then, AI can
decide on how to implement that solution - to me, what you proposed
above about how installadm should be modified is more of an implementation
detail.

 >
 > 4. Semantic Validation
 > ----------------------
 >
 > What is being called "semantic validation" is really just validation
 > which is more thorough than the syntactic validation offered by
 > schemas.  Semantic validation is a subset of this;  anything can be
 > checked, not just semantics.  The intent is to use for checking values
 > against other values, or against system context / environment.
 >
 > - Thorough semantic validation will be done on the client side when
 >   the installation starts.
 >
 > - A subset of this validation will be done on the server side, limited
 >   only by the fact that the server cannot provide the same environment
 >   as provided by the client.  (For example, the server can validate
 >   the format of a disk name, but the client can validate that a disk
 >   of that name exists on the system.)
 >
 > - Server side validation may be supplemented by additional validation
 >   done by manifest generation tools.
 >
 > - The validation methods called will be the same on both the server
 >   and client, but may behave differently during client and server
 >   validation.
 >

This problem statement is bundled in the XML parsing, perhaps we are
assuming that the semantic validation will be done as part of
XML parsing?  Now that we have chosen lxml as the parser, I wonder
how easy or hard it is to implement this as part of XML parsing.  I think
we also want to consider the option of doing it outside of XML parsing.

--Karen

Jack Schwartz wrote:
> Hi everyone.
>
> Minutes are posted at:
> http://www.opensolaris.org/os/project/caiman/auto_install/AI_mtg/Minutes/XML_parser_rework_minutes_090520.txt
>  
>
>
> Revised strawman is posted at:
> http://opensolaris.org/os/project/caiman/XML_Parsing/strawman-2
>
> Heads-up: I will be posting another meeting for tomorrow to account 
> for any last requirements and then split up the work.
>
>    Thanks,
>    Jack
>
>
> Jack Schwartz wrote:
>> Hi everyone.
>>
>> As we discussed today, tomorrow I am calling a meeting to discuss 
>> moving forward on the XML parsing project (including delegation of 
>> remaining work).  Work remaining includes finishing defining the 
>> design (breaking it into smaller pieces if necessary), and turning it 
>> into a functional specification.
>>
>> Attached is the strawman for the XML parser rework, as it is.  While 
>> there are still some questions to be answered, I cleaned it up and I 
>> think it will serve well as a reference point moving forward.  It can 
>> serve to show how much work remains for each of the problem statements.
>>
>> Meeting logistics:
>> Wednesday 5/20, 1 hour, 10:00 PT / 11:00 MT / 13:00 ET
>> Toll Free Dial In Number: (866)545-5227
>> Int'l Access/Caller Paid Dial In Number: (215)446-3648
>> ACCESS CODE: 7385082
>>
>>    Thanks,
>>    Jack
>

[caiman-discuss] XML parsing design meeting

Reply via email to