[caiman-discuss] XML parsing design meeting

Jack Schwartz Thu, 21 May 2009 07:44:55 -0700

Hi Sundar.

Thanks for your reply.  Responses inline.


Sundar Yamunachari wrote:
> Jack,
>
>    You have captured lot of information here. My comments on your 
> proposal are inline.
>
>> This is a first strawman proposal for XML reparsing work.  It
>> addresses the following parser related issues:
>>
>> 1) AI's multiple parsers present unneeded complexity and
>>    unmaintainability.
>>    Things to consider for a single parser:
>>    - functionality for data retrieval and search, schema
>>      compatibility, how supported / maintainable is the parser
>>    - The right parser will fall out as the other problems are solved.
>>
>> 2) Current AI manifests are not easy to use.
>>    - Fragmented, could be better organized
>>    - We decided that changing from XML is out of scope and off the
>>      table.
>>
>> 3) AI manifests need to be forward and backward compatible between
>>    builds.
>>    - Manifests of different versions than the automated installer must
>>      work whenever possible.
>>    - A given version of the automated installer must be able to
>>      recognize a manifest with which it is not compatible and
>>      gracefully fail.
>>
>> 4) Semantic validation is needed for AI.
>>    - Lack of it means failures further down the installation process
>>      instead of up front, or misconfiguration.
>>
>> 5) AI manifests have validation holes.
>>   - Example: criteria schema doesn't bind a min/max pair of ipv4
>>     address patterns to an ipv4 address criterion.
>>
>>
>> Proposal
>> ========
>>
>> This proposal is only high level design details, and does not include
>> APIs or contents of manifests.
>>
>> Items marked with XXX haven't yet been discussed at a meeting, but are
>> included here for discussion nonetheless.
>>
>>
>> 1. Parser
>> ---------
>>
>> The ManifestServ / libxml2 parser will be used.
> You mean lxml here?
I didn't when I wrote this, but I do now.
>>
>> Both are nearly identical in functionality.
>> Both are both wrappers around libxml2.
>> Plusses and minuses of libxml2 vs lxml follow:
> Again are you comparing lmxl and ManifestServ?
yes
>
> If both lxml and ManifestServ are identical in functionality, why we 
> need our own parser (ManifestServ)?
Its a moot point now, but I believe that:
- ManifestServ does some things (like search) better than lxml,
- ManifestServ is wrapper code already written and proven, which would 
be analogous
    to the wrapper code which now needs to be written for lxml to be used.

But most people agree that lxml is the way to go, largely because of 
maintainability aspects, so I'm not going to stand in the way.
>
>>
>> Advantages of ManifestServ:
>> - High level API for semantic validation is already implemented.
>> - ManifestServ allows for better narrowing of matches based on values
>>   of multiple child nodes.
>> - Simpler search-path syntax compared to Xpath
>> - Easy to make changes / additions because it is our own code.
>> - Less code layers
>>     consumer / ManifestServ / libxml2
>>         vs
>>     consumer / wrapper / lxml / libxml2
>> - Dynamic default setting is available
>>
>> Disadvantages of ManifestServ:
>> - Maintained by us.
>>
>> Advantages of lxml:
>> - has diff feature for checking the differences between two manifests.
>> - Easier to read schema version.
>> - Python 3 clean
>> - Xpath search has more operators for checking search values.
>> - Not maintained by us.
>>
>> Disadvantages of lxml:
>> - Need to write wrapper code to implement semantic validation and to
>>   do other more complex tasks.
>>
>>
>> 2. Manifest Organization
>> ------------------------
>>
>> The manifests used for AI will be organized as follows:
>>
>> Criteria Manifest will contain multiple sets of criteria, and pointers
>> to the AI manifest (which will contain installation parameters) and
>> the system-configuration (SC) manifest (which will be an SMF enhanced
>> profile containing system configuration parameters).  If a system's
>> criteria set matches a set in a Criteria Manifest, the AI manifest and
>> the SC manifest specified in that criteria manifest will be used.
>>
>> - Why have separate sets of criteria in one criteria manifest?
>>
>>   The alternative is to have a separate criteria manifest for
>>   different criteria sets which specify the same system
>>   definition.  Combining multiple criteria sets into a single file
>>   reduces the complexity.
>>
>> - Why split the manifest files?  Why not combine them into one file?
>>
>>   Combining all manifests into a single file would be a plus from
>>   a file manageability point of view.  However, there are more
>>   benefits to splitting the files, including:
>>
>>     - AI can offload setting up many parameters to another tool
>>       used by other parts of the system.
>>
>>     - System configuration can be done for an installation the
>>       same way it will be done for a sysid-config type situation:
>>       there will be a GUI tool used to take configuration input
>>       and will generate an enhanced SMF profile.  (A single SMF
>>       enhanced profile can contain multiple service bundles for
>>       instances of multiple services, including system-specific
>>       user-specified services for particular systems.)
> If system configuration manifest is not parsed/validated/managed by 
> install tools, why you include system configuration manifest as part 
> of criteria manifest? Can it be defined as an element of AI manifest?
We didn't want too many levels of indirection.  If SC manifest is an 
element of SC manifest (and I assume you mean a pointer to SC manifest), 
we would have

Criteria Manifest -> AI Manifest -> SC manifest

The way we've defined it, one can see the complete picture by looking at 
the Criteria Manifest:

Criteria Manifest -> { AI Manifest and SC manifest }
>>
>>     - Splitting the AI manifest from the enhanced SMF profile
>>       demarcates the system configuration items (which will be)
>>       used by other utilities from those which are specific to AI.
>>
>> - What if a service doesn't start out as an SMF service, but later
>>   becomes one?
>>
>>     That service will have to start out in the AI manifest and
>>     then migrate it later to an enhanced SMF profile.  However, it
>>     is a requirement on the SMF team to provide the install team
>>     with all the services needed for an install, by the time of
>>     the next release, so users won't ever see the migration.
> Are you talking about a "parameter that belongs to a service is not 
> implemented setting/updating the parameters now"? If the service is 
> not implemented, SMF can't do anything
Correct.  That's why such parameters would start out in the AI 
manifest.  When  SMF supports them, they would be moved to the SMF 
enhanced profile.
>
>>
>> - Why have a separate Criteria Manifest?
>>
>>     - A separate Criteria Manifest can point to the AI manifest
>>       and SC manifest as peers.  The latter two manifests together
>>       define a system.
>>
>>     - It does not make sense to combine the criteria with either
>>       of the two other manifests since the criteria map a system
>>       to a full definition of that system.
>>
>>     - An alternative would be to map different criteria to each
>>       of the AI manifest and SC manifest, but that would be
>>       difficult for users to keep track of.
>>
>> - Why have two types of schemas: DTD and RelaxNG?
>>
>>     - DTDs are the way enhanced SMF profiles will be specified.
>>       We're stuck with them for enhanced SMF profiles.
> If install tools do not manage system configuration manifest, then you 
> don't need a DTD schema.
Yes, that's true, and it is our intent that the user doesn't ever have 
to even look at the SC manifest directly;  they would use some 
configuration tool outside of the installer's rhelm.
>>
>>     - It is intended that a GUI tool (not part of the install
>>       project) will be used to generate the enhanced SMF profile.
>>       The user won't need to even look at the enhanced SMF profile
>>       as long as it is separated out from the other
>>       parts the user will directly manipulate.
>>
>>     - The rest of the input (that which the user will look at)
>>       can be done in a form which is much more readable,
>>       checkable and easily understood than DTDs.
>>
>>     - The files the user will be looking at will be done as
>>       RelaxNG as RelaxNG schemas have many benefits over DTDs:
>>
>>         - They include type and pattern checking of elements
>>
>>         - They are easier to read and understand.
>>           (http://www.mulberrytech.com/papers/whichschema)
>>
>> - What about other schemas?  Why RelaxNG?
>>
>>     W3C was the next most popular contender, but RelaxNG was
>>     developed to overcome some of the shortcomings of W3C.  For
>>     example, W3C schemas are inconsistent in their treatment of
>>     their various parts and are more limited in their constraints
>>     between attributes.  RelaxNG is consistent in the treatment of
>>     its parts, and so is more easily understood.  (See
>>     http://www.webreference.com/xml/column59/ and
>>     http://www.imc.org/ietf-xml-use/mail-archive/msg00217.html for
>>     examples.)
>>
>> (XXX proposed solution hasn't yet been discussed in a meeting.)
>> I suggest changing the name of the criteria manifest to the Map
>> Manifest as it maps systems to AI manifest and SC manifest based on
>> the criteria specified.
>>
>> - Why change the name?
>>     Criteria Manifest is not descriptive enough of what it is for.
>>     Can come up with a better name.
>>
>>
>> 3. Compatibility / Version checking
>> -----------------------------------
>>
>> How AI manifests will be version checked:
>>
>> Schemas will be assigned version numbers when written.  They will be
>> revved in ascending order.  RelaxNG schemas will use an annotation to
>> record the version.
> How will you decide when to increment the version of the schema? Any 
> change to the schema increment the version?
Please see the latest meeting minutes and revised strawman-2, out 
shortly.  The short answer is that every element and attribute in the 
schema will be assigned a version, and the manifest version number will 
have to match the number of the highest versioned element or attribute 
in the schema.
>>
>> Manifests will have their version in a field (likely an attribute
>> at the top of the tree.)
>>
>> The installadm utility will assign to manifests the version number
>> of the schema or DTD they validate against.
>>
>> - Why have version recorded by installadm?
>>
>>     This gives the version number meaning.  The version
>>     number says that the manifest validates against the schema
>>     for that installer version.
> I think the schema used to validate the manifest comes from the AI 
> image and not the installer machine.
Correct. The schema goes with the image.  The manifest is given a 
version number based on the schema which the installer validates it against.
>>
>> - How will the schema version be read?
>>
>>     Either via a parser (preferred) or a simple grep through the
>>     file.
>>
>> The client will check the manifest version number against the schema
>> and will print a warning if there is a mismatch.  The warning will
>> print the versions of all files, and will say that the installation
>> will continue but may not work.
> Client schema is coming from the image. There is a validation done in 
> the server when the manifest is added (except default manifest). There 
> is another validation by the client when it got the manifest from the 
> server. If the validation is successful on the server, and it is 
> successful on the clients what are the conditions that version 
> mismatch cause problem with the installation?
One case would be a semantic change in a field, how it is interpreted.  
There would be no syntax change so validation would succeed, but the 
meaning would change and so could give different behaviour during 
installation.  The new proposed solution addresses this;  see the 
revisions to the strawman.
>>
>>
>> - Why only a warning?
>>
>>     We want the manifest to validate if possible.  The validator
>>     will catch errors if there are errors, and the version warning
>>     will provide the user a reason why those errors occurred.
>>
>> - Caveat:
>>     Need to change the field name if the semantic meaning of that
>>     field changes, but the way it appears in the manifest appears
>>     the same.  Else there could be unexpected results because the
>>     user expects the old behavior when the behavior has changed.
>>
>> (XXX no proposed solution, not discussed yet)
>> Having a version implanted by installadm could lead to copies of the
>> same manifest with multiple version numbers.  Do we want a second
>> version number based on the manifest format itself, to check for the
>> sameness of manifests?
>>
>> (XXX no proposed solution, not discussed yet)
>> Is there a need / how to check DTD versions?
>>
>> (XXX no proposed solution, not discussed yet)
>> Other things to consider for version compatibility:
>>
>> - when adding items which don't conflict with old items, make added
>>   items optional. do a minor rev.
>>
>> - Only when we make an incompatible change would we completely change
>>   the schema and make a major rev.
>>
>> - Can ship multiple versions of the schema for backward compatibility.
>>
>>   - This will buy the closest thing to "it just works" compatibility
>>     though.
>>
>>   - Will need to hardwire specific functionality for different
>>     versions in the AI code proper though;  may be a maintainability
>>     nightmare.
>>
>>
>> 4. Semantic Validation
>> ----------------------
>>
>> What is being called "semantic validation" is really just validation
>> which is more thorough than the syntactic validation offered by
>> schemas.  Semantic validation is a subset of this;  anything can be
>> checked, not just semantics.  The intent is to use for checking values
>> against other values, or against system context / environment.
>>
>> - Thorough semantic validation will be done on the client side when
>>   the installation starts.
>>
>> - A subset of this validation will be done on the server side, limited
>>   only by the fact that the server cannot provide the same environment
>>   as provided by the client.  (For example, the server can validate
>>   the format of a disk name, but the client can validate that a disk
>>   of that name exists on the system.)
>>
>> - The validation methods called will be the same on both the server
>>   and client, but an environment variable will select whether those
>>   methods will do server or client mode checking.
> You mean to say that they use the same interface. Environment variable 
> looks too specific.
OK.
>>
>>
>> 5. Manifest with validation holes
>> ---------------------------------
>>
>> (XXX proposed solution hasn't yet been discussed in a meeting.)
>> I suggest making better use of the schema and providing a clearer
>> syntax by replacing the format of a criterion and putting the "min",
>> "max" or "value" values explicitly as in the following:
>>
>>         <criteria_set>
>>                 <IPv4 min=1.2.3.4 max=5.6.7.8>
>>                 <MAC value=2.3.4.5>
>>                 <MEM min=1Gb>    <!-- unbounded max -->
>>         </criteria_set>
>>
>> -Why change from current implementation?
>>
>>     - More compact syntax.
>>     - More robust checking vs schema.
>>     - No need for "unbounded";  just leave off min or max
>>       attribute.
> This is comparing with the current implementation. Can you elaborate 
> what are the options for criteria schema/manifest (MIN/MAX/one 
> value/multiple values) and what is meant by each of those options? 
> Whether the elements can be combined etc. This doesn't give me enough 
> information to understand what you are proposing.
Thanks for pointing this out.  I'll have to clean this up in the strawman.

Some criteria (IPv4, MAX, mem) can take a range of values (min, max) 
while others  (arch) take a single value.  A criteria set is a grouping 
of criteria which, when all match a system, determines that Criteria 
Manifest will be used for that system.
>
> What is the value for the user here? Will it simplify user's point of 
> view?
Yes.  I addressed this above under "why change from current implementation".

    Thanks for your feedback, Sundar.

    Jack
>
> Thanks,
> Sundar
>>
>> - Other options?
>>     A criterion needs to contain a name and its values.  The above
>>     does this with great efficiency.
>>
>> (XXX proposed solution hasn't yet been discussed in a meeting.)
>> How to do validation of manifests with the use of derived profiles:
>> Derived profiles implementation have not been decided yet.  If they
>> are implemented on the server side and there is something which can be
>> validated on the server, they will be.  Full validation of them will
>> be done on the client side, as full client context is provided there.

[caiman-discuss] XML parsing design meeting

Reply via email to