stevedlawrence commented on code in PR #205:
URL: https://github.com/apache/daffodil-site/pull/205#discussion_r2578501202
##########
site/dfdl-best-practices.md:
##########
@@ -0,0 +1,872 @@
+---
+layout: page
+title: Best Practices Guide for DFDL Schema Authors
+group: nav-right
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+## Table of Contents
+{:.no_toc}
+<!-- The {: .no_toc } excludes the above heading from the ToC -->
+
+1. yes, this is the standard Jekyll way to do a ToC (this line gets removed)
+{:toc}
+<!-- note the above line {:toc} cannot have whitespace at the start -->
+
+
+# Introduction
+
+This page is a collection of notes on how to create DFDL schemas to obtain
some real benefits:
+- Minmizes XML and XSD namespace complexity
+- Provides composition properties so DFDL schemas can be reused as libraries
in larger schemas
+- Ensures compatibility of the schema with more restrictive infosets than XML.
+ For example: JSON, Apache NiFi Records, or Apache Spark Structs.
+- Ensures portability of the DFDL schema for use to validate XML infosets
+ using multiple different _XML Schema Validation libraries_ such as [Xerces
C](
+ {{ site.data.links.reference.xercesc}}) and [libxml2]({{
site.data.links.reference.libxml2}}).
+
+The [DFDL Training page lists several example
schemas](/dfdl-training#exampleSchemas) which follow
+this style guide fully which you can use as good starting points.
+
+This set of notes represents best practices after learning _the hard way_ from
many debugging
+exercises and creating a wide variety of DFDL schemas from small teaching
examples to large
+production schemas for major data formats with more than 100K lines of DFDL.
+
+For those familiar with XML Schema (XSD) design patterns, our schema style is
a variation of
+what is called the
+[_Venetian Blind_ pattern]({{ site.data.links.reference.venetianBlind}}),
+that one might call _Strict Venetian-Blind Type Library_.
+
+- "Strict" because we strongly minimize the use of global elements, namespaces,
+ and some other XSD constructs that are highly specialized to XML as the data
representation.
+- "Type-Library" because we structure DFDL schemas so that there is always
+ the option for a user to use the schema as a library within a larger
encompassing
+ DFDL schema by referencing a complex type definition provided by the library
schema.
+
+Below are the details.
+
+# Avoid Element Namespaces
+
+Much of the complexity of XML and XML Schema comes from their namespace
features.
+This can be avoided entirely by following simple conventions.
+
+Since many data representations (such as JSON, Apache NiFi Records) have no
notion of
+namespaces, following this guidance keeps DFDL schemas compatible with those
representations.
+
+The conventions are:
+- DFDL Schemas should use `elementFormDefault="unqualified"` (which is the
default for XML Schemas).
+- Daffodil tunable
+ [`unqualifiedPathStepPolicy`](/tunables/#unqualifiedpathsteppolicy)
+ should be defined to be `noNamespace` (which is its default value)
+- DFDL schemas should not use element references.
+- Most DFDL Schema files should contain only definitions of types, groups,
DFDL formats, and DFDL
+ variables.
+ - These schema files should share a single target namespace
+ with a [well-chosen unique URI](#namespace-uri-conventions).
+- A DFDL Schema should define global elements only for root elements.
+ - These should be in a single separate file with _no target namespace_.
+ - These should be _one liner_ declarations which just reference types
imported from the other
+ schema files.
+ - Most DFDL schemas will need only 1 or 2 such global elements.
+
+The real content of the schema should always be in a named complex type
definition.
+This gives the schema user the choice of what they want to call their elements,
+and enables use of the schema as a child element within a
+larger structure.
+
+Defining only global types and groups -- leaving the global elements only for
testing or the
+end-user of the schema -- provides greater flexibility.
+All schemas are available to use as libraries.
+Hence, the standard start of a DFDL schema is doing to be:
+
+```xsd mySchemaType.dfdl.xsd
Review Comment:
This code block does not render correctly, I *think* because of the
mySchemaType.dfdl.xsd? Is that supported by jekyll markdown?
##########
site/dfdl-best-practices.md:
##########
@@ -0,0 +1,872 @@
+---
+layout: page
+title: Best Practices Guide for DFDL Schema Authors
+group: nav-right
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+## Table of Contents
+{:.no_toc}
+<!-- The {: .no_toc } excludes the above heading from the ToC -->
+
+1. yes, this is the standard Jekyll way to do a ToC (this line gets removed)
+{:toc}
+<!-- note the above line {:toc} cannot have whitespace at the start -->
+
+
+# Introduction
+
+This page is a collection of notes on how to create DFDL schemas to obtain
some real benefits:
+- Minmizes XML and XSD namespace complexity
+- Provides composition properties so DFDL schemas can be reused as libraries
in larger schemas
+- Ensures compatibility of the schema with more restrictive infosets than XML.
+ For example: JSON, Apache NiFi Records, or Apache Spark Structs.
+- Ensures portability of the DFDL schema for use to validate XML infosets
+ using multiple different _XML Schema Validation libraries_ such as [Xerces
C](
+ {{ site.data.links.reference.xercesc}}) and [libxml2]({{
site.data.links.reference.libxml2}}).
+
+The [DFDL Training page lists several example
schemas](/dfdl-training#exampleSchemas) which follow
+this style guide fully which you can use as good starting points.
+
+This set of notes represents best practices after learning _the hard way_ from
many debugging
+exercises and creating a wide variety of DFDL schemas from small teaching
examples to large
+production schemas for major data formats with more than 100K lines of DFDL.
+
+For those familiar with XML Schema (XSD) design patterns, our schema style is
a variation of
+what is called the
+[_Venetian Blind_ pattern]({{ site.data.links.reference.venetianBlind}}),
+that one might call _Strict Venetian-Blind Type Library_.
+
+- "Strict" because we strongly minimize the use of global elements, namespaces,
+ and some other XSD constructs that are highly specialized to XML as the data
representation.
+- "Type-Library" because we structure DFDL schemas so that there is always
+ the option for a user to use the schema as a library within a larger
encompassing
+ DFDL schema by referencing a complex type definition provided by the library
schema.
+
+Below are the details.
+
+# Avoid Element Namespaces
+
+Much of the complexity of XML and XML Schema comes from their namespace
features.
+This can be avoided entirely by following simple conventions.
+
+Since many data representations (such as JSON, Apache NiFi Records) have no
notion of
+namespaces, following this guidance keeps DFDL schemas compatible with those
representations.
+
+The conventions are:
+- DFDL Schemas should use `elementFormDefault="unqualified"` (which is the
default for XML Schemas).
+- Daffodil tunable
+ [`unqualifiedPathStepPolicy`](/tunables/#unqualifiedpathsteppolicy)
+ should be defined to be `noNamespace` (which is its default value)
+- DFDL schemas should not use element references.
+- Most DFDL Schema files should contain only definitions of types, groups,
DFDL formats, and DFDL
+ variables.
+ - These schema files should share a single target namespace
+ with a [well-chosen unique URI](#namespace-uri-conventions).
+- A DFDL Schema should define global elements only for root elements.
+ - These should be in a single separate file with _no target namespace_.
+ - These should be _one liner_ declarations which just reference types
imported from the other
+ schema files.
+ - Most DFDL schemas will need only 1 or 2 such global elements.
+
+The real content of the schema should always be in a named complex type
definition.
+This gives the schema user the choice of what they want to call their elements,
+and enables use of the schema as a child element within a
+larger structure.
+
+Defining only global types and groups -- leaving the global elements only for
testing or the
+end-user of the schema -- provides greater flexibility.
+All schemas are available to use as libraries.
+Hence, the standard start of a DFDL schema is doing to be:
+
+```xsd mySchemaType.dfdl.xsd
+<schema
+ targetNamespace="urn:example.com:schema:dfdl:mySchema:ms"
+ xmlns:ms="urn:example.com:schema:dfdl:mySchema:msns"
+ xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+ xmlns:xs="http://www.w3.org/2001/XMLSchema"
+ xmlns="http://www.w3.org/2001/XMLSchema"
+ ... >
+
+ ... import/include statements and top level format annotations...
+
+ <complexType name="mySchemaType">
+ ... the real schema contents is all here or reachable from here. ...
+ </complexType>
+
+ ... other types and groups ...
+
+</schema>
+```
+
+Included files, and imported files that are part of the same DFDL schema
project should have
+no global elements at all.
+
+The only global element(s) defined should be _one liners_ defined in a single
_root_
+schema file like this:
+
+```xsd
+<schema
+ xmlns:ms="urn:example.com:schema:dfdl:mySchema:ms"
+ ... >
+ <!-- Root elements only - no target namespace -->
+
+ <import namespace="urn:example.com:schema:dfdl:mySchema:ms"
+ schemaLocation=".../mySchemaType.dfdl.xsd"/>
+
+ ... a top level dfdl:format declaration ...
+
+ <!--
+ The root element - a type-reference only to an individual item
+ of the data format
+ -->
+
+ <element name="myRoot" type="ms:mySchemaType"/>
+
+ <!--
+ If needed (for testing) optional second root element for files containing
+ repetitions of the mySchemaType data format. Also a type reference only.
+ -->
+ <element name="myRootFile" type="ms:mySchemaFileType"/>
+
+</schema>
+```
+
+Rationale:
+
+- This makes schemas more flexible for reuse because it takes no position on
element
+ names that the schema user can't avoid if they so choose.
+- JSON compatible.
+- When the only global elements are defined in a no-namespace schema, XML
instance documents:
+ - never use prefixes on element names
+ - never (almost) have namespace prefix definitions in them
+
+The only namespace prefix definitions one _may_ still require in XML instance
documents are
+exactly these:
+- `xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"` - used for `xsi:nil`
and `xsi:type`
+ attributes
+- `xmlns:xs="http://www.w3.org/2001/XMLSchema"` - used for values of
`xsi:type` attributes
+
+For example:
+```xml
+ <data xsi:nil="true" />
+ <start xsi:type="xs:dateTime" >1961-02-01T06:02:03Z</start>
+```
+The `xsi:nil` attribute is only needed if a DFDL schema uses nillable elements.
+The `xsi:type` attribute is only used during test/debug activities to
+enable type-sensitive equality comparison[^ztime].
+These exceptions never create the need for an _element_ to have a namespace
prefix.
+
+[^ztime]: For example the `xs:dateTime` value `1961-02-01T01:02:03-05:00` is
US.EST equivalent
+ to `1961-02-01T06:02:03Z` which is UTC.
+
+> **Security Note:** Avoiding element prefixes and namespace prefix
definitions is also
+> considered a
+> cyber-security improvement for XML since they can be used for covert
channels without
+> making the document invalid.
+> A primary use case for DFDL and Daffodil is in _data cybersecurity_ where
this principle
+> is important!
+
+Summary:
+- A DFDL schema should consist almost entirely of type and group definitions.
+ - The type and group definitions should have a target namespace.
+ - These files should contain no global elements at all.
+- Schema files that define global elements should have one or at most two
global element
+ declarations in them, and those should be the only definitions in that file.
+- These global elements should have _no target namespace_.
+- If the DFDL schema is a component library, then the global elements exist
for testing only
+ and are ignored entirely when the schema is reused as part of a larger
schema.
+
+## Namespace URI Conventions
+
+There are good conventions to use when
+- choosing a namespace URI for a DFDL schema, and
+- choosing namespace prefixes
+
+Suppose you work for example.com, and you have XML Schemas, DFDL Schemas, and
JSON schemas.
+
+Let's suppose you have a DFDL schema for a format named "ebx data".
+Suppose there are various versions of this format.
+
+The following is a useful namespace URI and prefix definition for this format:
+```xsd
+xmlns:ebx="urn:example.com:schema:dfdl:ebxData:ebx"
+```
+This has these benefits:
+- The URI is a
+ [URN (Universal Resource Name)]({{site.data.links.reference.urn}})
+ which means it is not an identifier nor a location to retrieve from.
+- It is unique to your company/organization
+- It identifies it as a DFDL schema namespace
+- It contains the format name
+- It ends with the suggested prefix to be used for this namespace
+
+Note also that there is no version information at the end of this URI.
+This turns out to be a best practice.
+
+Everyone who sees this namespace URI alone as in an import statement like this:
+
+```xsd
Review Comment:
I noticed this code block does not have syntax highlighting. I wonder if
jekyll does not support `xsd` and those should all be replaced `xml` instead?
##########
site/dfdl-best-practices.md:
##########
@@ -0,0 +1,872 @@
+---
+layout: page
+title: Best Practices Guide for DFDL Schema Authors
+group: nav-right
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+## Table of Contents
+{:.no_toc}
+<!-- The {: .no_toc } excludes the above heading from the ToC -->
+
+1. yes, this is the standard Jekyll way to do a ToC (this line gets removed)
+{:toc}
+<!-- note the above line {:toc} cannot have whitespace at the start -->
+
+
+# Introduction
+
+This page is a collection of notes on how to create DFDL schemas to obtain
some real benefits:
+- Minmizes XML and XSD namespace complexity
+- Provides composition properties so DFDL schemas can be reused as libraries
in larger schemas
+- Ensures compatibility of the schema with more restrictive infosets than XML.
+ For example: JSON, Apache NiFi Records, or Apache Spark Structs.
+- Ensures portability of the DFDL schema for use to validate XML infosets
+ using multiple different _XML Schema Validation libraries_ such as [Xerces
C](
+ {{ site.data.links.reference.xercesc}}) and [libxml2]({{
site.data.links.reference.libxml2}}).
+
+The [DFDL Training page lists several example
schemas](/dfdl-training#exampleSchemas) which follow
+this style guide fully which you can use as good starting points.
+
+This set of notes represents best practices after learning _the hard way_ from
many debugging
+exercises and creating a wide variety of DFDL schemas from small teaching
examples to large
+production schemas for major data formats with more than 100K lines of DFDL.
+
+For those familiar with XML Schema (XSD) design patterns, our schema style is
a variation of
+what is called the
+[_Venetian Blind_ pattern]({{ site.data.links.reference.venetianBlind}}),
+that one might call _Strict Venetian-Blind Type Library_.
+
+- "Strict" because we strongly minimize the use of global elements, namespaces,
+ and some other XSD constructs that are highly specialized to XML as the data
representation.
+- "Type-Library" because we structure DFDL schemas so that there is always
+ the option for a user to use the schema as a library within a larger
encompassing
+ DFDL schema by referencing a complex type definition provided by the library
schema.
+
+Below are the details.
+
+# Avoid Element Namespaces
+
+Much of the complexity of XML and XML Schema comes from their namespace
features.
+This can be avoided entirely by following simple conventions.
+
+Since many data representations (such as JSON, Apache NiFi Records) have no
notion of
+namespaces, following this guidance keeps DFDL schemas compatible with those
representations.
+
+The conventions are:
+- DFDL Schemas should use `elementFormDefault="unqualified"` (which is the
default for XML Schemas).
+- Daffodil tunable
+ [`unqualifiedPathStepPolicy`](/tunables/#unqualifiedpathsteppolicy)
+ should be defined to be `noNamespace` (which is its default value)
+- DFDL schemas should not use element references.
+- Most DFDL Schema files should contain only definitions of types, groups,
DFDL formats, and DFDL
+ variables.
+ - These schema files should share a single target namespace
+ with a [well-chosen unique URI](#namespace-uri-conventions).
+- A DFDL Schema should define global elements only for root elements.
+ - These should be in a single separate file with _no target namespace_.
+ - These should be _one liner_ declarations which just reference types
imported from the other
+ schema files.
+ - Most DFDL schemas will need only 1 or 2 such global elements.
+
+The real content of the schema should always be in a named complex type
definition.
+This gives the schema user the choice of what they want to call their elements,
+and enables use of the schema as a child element within a
+larger structure.
+
+Defining only global types and groups -- leaving the global elements only for
testing or the
+end-user of the schema -- provides greater flexibility.
+All schemas are available to use as libraries.
+Hence, the standard start of a DFDL schema is doing to be:
+
+```xsd mySchemaType.dfdl.xsd
+<schema
+ targetNamespace="urn:example.com:schema:dfdl:mySchema:ms"
+ xmlns:ms="urn:example.com:schema:dfdl:mySchema:msns"
+ xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+ xmlns:xs="http://www.w3.org/2001/XMLSchema"
+ xmlns="http://www.w3.org/2001/XMLSchema"
+ ... >
+
+ ... import/include statements and top level format annotations...
+
+ <complexType name="mySchemaType">
+ ... the real schema contents is all here or reachable from here. ...
+ </complexType>
+
+ ... other types and groups ...
+
+</schema>
+```
+
+Included files, and imported files that are part of the same DFDL schema
project should have
+no global elements at all.
+
+The only global element(s) defined should be _one liners_ defined in a single
_root_
+schema file like this:
+
+```xsd
+<schema
+ xmlns:ms="urn:example.com:schema:dfdl:mySchema:ms"
+ ... >
+ <!-- Root elements only - no target namespace -->
+
+ <import namespace="urn:example.com:schema:dfdl:mySchema:ms"
+ schemaLocation=".../mySchemaType.dfdl.xsd"/>
+
+ ... a top level dfdl:format declaration ...
+
+ <!--
+ The root element - a type-reference only to an individual item
+ of the data format
+ -->
+
+ <element name="myRoot" type="ms:mySchemaType"/>
+
+ <!--
+ If needed (for testing) optional second root element for files containing
+ repetitions of the mySchemaType data format. Also a type reference only.
+ -->
+ <element name="myRootFile" type="ms:mySchemaFileType"/>
+
+</schema>
+```
+
+Rationale:
+
+- This makes schemas more flexible for reuse because it takes no position on
element
+ names that the schema user can't avoid if they so choose.
+- JSON compatible.
+- When the only global elements are defined in a no-namespace schema, XML
instance documents:
+ - never use prefixes on element names
+ - never (almost) have namespace prefix definitions in them
+
+The only namespace prefix definitions one _may_ still require in XML instance
documents are
+exactly these:
+- `xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"` - used for `xsi:nil`
and `xsi:type`
+ attributes
+- `xmlns:xs="http://www.w3.org/2001/XMLSchema"` - used for values of
`xsi:type` attributes
+
+For example:
+```xml
+ <data xsi:nil="true" />
+ <start xsi:type="xs:dateTime" >1961-02-01T06:02:03Z</start>
+```
+The `xsi:nil` attribute is only needed if a DFDL schema uses nillable elements.
+The `xsi:type` attribute is only used during test/debug activities to
+enable type-sensitive equality comparison[^ztime].
+These exceptions never create the need for an _element_ to have a namespace
prefix.
+
+[^ztime]: For example the `xs:dateTime` value `1961-02-01T01:02:03-05:00` is
US.EST equivalent
+ to `1961-02-01T06:02:03Z` which is UTC.
+
+> **Security Note:** Avoiding element prefixes and namespace prefix
definitions is also
+> considered a
+> cyber-security improvement for XML since they can be used for covert
channels without
+> making the document invalid.
+> A primary use case for DFDL and Daffodil is in _data cybersecurity_ where
this principle
+> is important!
+
+Summary:
+- A DFDL schema should consist almost entirely of type and group definitions.
+ - The type and group definitions should have a target namespace.
+ - These files should contain no global elements at all.
+- Schema files that define global elements should have one or at most two
global element
+ declarations in them, and those should be the only definitions in that file.
+- These global elements should have _no target namespace_.
+- If the DFDL schema is a component library, then the global elements exist
for testing only
+ and are ignored entirely when the schema is reused as part of a larger
schema.
+
+## Namespace URI Conventions
+
+There are good conventions to use when
+- choosing a namespace URI for a DFDL schema, and
+- choosing namespace prefixes
+
+Suppose you work for example.com, and you have XML Schemas, DFDL Schemas, and
JSON schemas.
+
+Let's suppose you have a DFDL schema for a format named "ebx data".
+Suppose there are various versions of this format.
+
+The following is a useful namespace URI and prefix definition for this format:
+```xsd
+xmlns:ebx="urn:example.com:schema:dfdl:ebxData:ebx"
+```
+This has these benefits:
+- The URI is a
+ [URN (Universal Resource Name)]({{site.data.links.reference.urn}})
+ which means it is not an identifier nor a location to retrieve from.
+- It is unique to your company/organization
+- It identifies it as a DFDL schema namespace
+- It contains the format name
+- It ends with the suggested prefix to be used for this namespace
+
+Note also that there is no version information at the end of this URI.
+This turns out to be a best practice.
+
+Everyone who sees this namespace URI alone as in an import statement like this:
+
+```xsd
+<xs:import namespace="urn:example.com:schema:dfdl:ebxData:ebx"
+ schemaLocation="/com/example/schema/dfdl/ebxData.dfdl.xsd"/>
+```
+
+From this one automatically knows the prefix to use by convention, because it
is the
+last part of the namespace URI.
+
+These conventions for the `schemaLocation` are also useful as they provide
something like
+the Java package system to avoid name collisions.
+
+## Versioning - In the Infoset/Data, Not the Namespace URI
{#noVersionsInNamespaceURIs}
+
+It's become clear in XML Schemas (not just DFDL) that having version specific
namespace URIs
+causes difficulty.
+
+One issue is that the path expressions that navigate such elements become
version specific even if
+the elements they are ultimately accessing are common to multiple versions.
Such paths are
+monomorphic to specific versions. It is much nicer if path expressions are as
polymorphic
+across versions as possible.
+
+Hence, define an element in your schema to hold the version information.
+Don't append a version number to a namespace URI.
+
+(Since JSON has no namespaces, you can't use namespaces to carry version
information if you want
+to use JSON.
+Hence carrying version information in an element makes your schema more JSON
+compatible.)
+
+# Express DFDL Properties on the Simple Types, not the Elements
+
+Data formats usually are repetitive.
+The same format properties are often needed repeatedly for many different
elements in the overall
+format.
+
+This is best captured by defining named types and groups.
+Redundancy is then avoided by sharing use of types for every element having
that same format.
+
+One then avoids repetitive DFDL properties by placing the properties on the
simple type
+definitions rather than on the elements having that type.
+
+It would be nice to say this applies for both simple and complex types, but
alas the same exact
+style is not usable on complex type definitions, which do not carry DFDL
properties in
+DFDL version 1.0.
+To avoid redundant properties on complex types it is suggested that named
format definitions
+are created and used on each complex type variation.
+This is not quite as clean, but minimizes redundancy within what is allowed.
+
+Note that the DFDL Workgroup is considering adding the ability to [put DFDL
properties on complex
+types]({{site.data.links.dfdlSpec.issue71}}) in a future version of the DFDL
standard.
+
+# Avoid Child Elements with the Same Name
+
+XML Schema has a data model with some flexibility needed only for markup
languages.
+
+DFDL uses XML Schema to describe structured data, where this flexibility is
not needed.
+
+DFDL omits many XML Schema constructs, but DFDL version 1.0 still allows some
things that are
+best avoided to insure the ability to interoperate with other data models.
+
+One such feature is the ability in XML Schema to have multiple child elements
with the same name.
+So long as it is unambiguous what element declaration is intended, XML Schema
allows things like:
+```xsd
+...
+<element name="foo" ..../>
+<element name="bar" ..../>
+<element name="foo" ..../>
+```
+This is allowed because the element `bar` separates the two different
declarations of
+the `foo` element;
+hence, when parsing XML, the first `foo` declaration is used until a `bar`
element is
+encountered, and after that the second `foo` declaration is used.
+
+That's all interesting and useful for markup languages, but no other
structured data system allows this.
+Hence, while DFDL v1.0 allows this, it is best avoided to enable DFDL schemas
to be interfaced to
+data systems having more typical data models.
+
+You can see why XML Schema allows this if you think about markup as in HTML.
+XML is for markup languages and XSD is for describing them.
+In a markup language you are often going to need lots of the same tag to
appear within text
+repeatedly, separated by other tags at that same level of nesting.
+The fact that the instance data is XML means the tag-names make it easy to
tease apart the document.
+
+DFDL is for describing data that has no tags or specific syntax that the
schema language
+can depend upon.
+So it provides only a subset of XSD features, and best practice is to avoid
things that aren't
+typical in structured data systems.
+
+JSON also has no notion of child elements with the same name, so avoiding this
enables a
+DFDL schema to be JSON compatible.
+
+# Avoid Anonymous Choices
+XML Schema allows a choice to be anonymous within the data model of an
element. For example:
+```xsd
+<element name="myElement">
+ <complexType>
+ <sequence>
+ ... various elements ...
+ <choice>
+ ... choice branches ...
+ </choice>
+ ... various more elements
+ </sequence>
+ </complexType>
+</element>
+```
+The choice above appears in the middle of a sequence group, with elements
and/or other groups
+before and after it.
+Note that there is no element name associated with the choice.
+Rather in XML data, the choice branches would at some level within them
contain elements
+and these would appear as direct children of the `myElement` parent element.
+
+Many other data modeling languages do not have this capability.
+They require choices to be named.
+
+Hence, this sort of anonymous choice is to be avoided.
+
+There are two ways to avoid trouble here.
+
+Choice groups should always be the model-groups of named elements.
+Choice branches within the choice are all just _scalar_ elements (meaning
non-dimensioned:
Review Comment:
Thoughts on making these lines a bulleted or numbered list to make it more
clear these are the two ways? It wasn't immediately clear as I was reading
this. And then either add the examples in each bullet or include them after the
list.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]