This is an automated email from the ASF dual-hosted git repository. mbeckerle pushed a commit to branch daf-2998-site in repository https://gitbox.apache.org/repos/asf/daffodil-site.git
commit 31fe32adac23cf43a967c5e1108a3a7c09e87bc7 Author: Michael Beckerle <[email protected]> AuthorDate: Tue Dec 16 13:59:10 2025 -0500 Add more training and best-practice materials Add best practice note about enums. Enum symbols should not contain whitespace. Best practice materials are slide decks mostly. Add complex type around otherwise anon choice in examples on best practice page. Update (partly) Standard profile design note Overlaps with best practices doc quite a bit. Not all of this is fixed, but it's only a design note. DAFFODIL-2998 --- site/best-practices/P-Avoid-Check-Constraints.pdf | Bin 0 -> 187529 bytes site/best-practices/P-Avoid-Check-Constraints.pptx | Bin 0 -> 261131 bytes .../P-DFDL-BLOBs-v-HexBinary-array.pdf | Bin 0 -> 138852 bytes .../P-DFDL-BLOBs-v-HexBinary-array.pptx | Bin 0 -> 255363 bytes site/best-practices/P-DFDL-Reject-Elements.pdf | Bin 0 -> 96789 bytes site/best-practices/P-DFDL-Reject-Elements.pptx | Bin 0 -> 253205 bytes site/best-practices/P-DFDL-Round-Trip-Testing.pdf | Bin 0 -> 141572 bytes site/best-practices/P-DFDL-Round-Trip-Testing.pptx | Bin 0 -> 323912 bytes site/best-practices/P-DFDL-Structured-Text.pdf | Bin 0 -> 146772 bytes site/best-practices/P-DFDL-Structured-Text.pptx | Bin 0 -> 258146 bytes .../design-notes/Proposed-DFDL-Standard-Profile.md | 43 ++++++---- site/dfdl-best-practices.md | 91 ++++++++++++--------- site/dfdl-extensions.md | 9 ++ site/dfdl-training.md | 8 ++ .../P-DFDL-Properties-lengthKind-bitOrder.pdf | Bin 0 -> 133664 bytes .../P-DFDL-Properties-lengthKind-bitOrder.pptx | Bin 0 -> 259843 bytes site/tutorials/P-Filling-vs-Padding-Trimming.pdf | Bin 0 -> 143349 bytes site/tutorials/P-Filling-vs-Padding-Trimming.pptx | Bin 0 -> 255931 bytes 18 files changed, 98 insertions(+), 53 deletions(-) diff --git a/site/best-practices/P-Avoid-Check-Constraints.pdf b/site/best-practices/P-Avoid-Check-Constraints.pdf new file mode 100755 index 0000000..c30522b Binary files /dev/null and b/site/best-practices/P-Avoid-Check-Constraints.pdf differ diff --git a/site/best-practices/P-Avoid-Check-Constraints.pptx b/site/best-practices/P-Avoid-Check-Constraints.pptx new file mode 100755 index 0000000..887b90f Binary files /dev/null and b/site/best-practices/P-Avoid-Check-Constraints.pptx differ diff --git a/site/best-practices/P-DFDL-BLOBs-v-HexBinary-array.pdf b/site/best-practices/P-DFDL-BLOBs-v-HexBinary-array.pdf new file mode 100755 index 0000000..d962f89 Binary files /dev/null and b/site/best-practices/P-DFDL-BLOBs-v-HexBinary-array.pdf differ diff --git a/site/best-practices/P-DFDL-BLOBs-v-HexBinary-array.pptx b/site/best-practices/P-DFDL-BLOBs-v-HexBinary-array.pptx new file mode 100755 index 0000000..f47ae01 Binary files /dev/null and b/site/best-practices/P-DFDL-BLOBs-v-HexBinary-array.pptx differ diff --git a/site/best-practices/P-DFDL-Reject-Elements.pdf b/site/best-practices/P-DFDL-Reject-Elements.pdf new file mode 100755 index 0000000..a5e0bed Binary files /dev/null and b/site/best-practices/P-DFDL-Reject-Elements.pdf differ diff --git a/site/best-practices/P-DFDL-Reject-Elements.pptx b/site/best-practices/P-DFDL-Reject-Elements.pptx new file mode 100755 index 0000000..d91cc27 Binary files /dev/null and b/site/best-practices/P-DFDL-Reject-Elements.pptx differ diff --git a/site/best-practices/P-DFDL-Round-Trip-Testing.pdf b/site/best-practices/P-DFDL-Round-Trip-Testing.pdf new file mode 100755 index 0000000..6521f9e Binary files /dev/null and b/site/best-practices/P-DFDL-Round-Trip-Testing.pdf differ diff --git a/site/best-practices/P-DFDL-Round-Trip-Testing.pptx b/site/best-practices/P-DFDL-Round-Trip-Testing.pptx new file mode 100755 index 0000000..af54f24 Binary files /dev/null and b/site/best-practices/P-DFDL-Round-Trip-Testing.pptx differ diff --git a/site/best-practices/P-DFDL-Structured-Text.pdf b/site/best-practices/P-DFDL-Structured-Text.pdf new file mode 100755 index 0000000..b70ccc3 Binary files /dev/null and b/site/best-practices/P-DFDL-Structured-Text.pdf differ diff --git a/site/best-practices/P-DFDL-Structured-Text.pptx b/site/best-practices/P-DFDL-Structured-Text.pptx new file mode 100755 index 0000000..a8dfea4 Binary files /dev/null and b/site/best-practices/P-DFDL-Structured-Text.pptx differ diff --git a/site/dev/design-notes/Proposed-DFDL-Standard-Profile.md b/site/dev/design-notes/Proposed-DFDL-Standard-Profile.md index 6743b35..f4c6cb1 100644 --- a/site/dev/design-notes/Proposed-DFDL-Standard-Profile.md +++ b/site/dev/design-notes/Proposed-DFDL-Standard-Profile.md @@ -22,10 +22,26 @@ limitations under the License. {% endcomment %} --> -*Version 0.3 2023-12-08* +*Version 0.4 2025-12-22* + + +## Table of Contents +{:.no_toc} +<!-- The {: .no_toc } excludes the above heading from the ToC --> + +1. yes, this is the standard Jekyll way to do a ToC (this line gets removed) +{:toc} +<!-- note the above line {:toc} cannot have whitespace at the start --> + # Introduction +> **Note:** This proposed standard profile overlaps a great deal with the +> [DFDL Schema Best Practices](/dfdl-best-practices) and can be viewed as a +> mechanism to enforce many of those practices. +> +> This page needs to be revised in light of the best practices page. + In attempting to integrate Apache Daffodil with other data processing software, the need to make DFDL schemas interoperate properly in conjunction with other data models has arisen. @@ -40,10 +56,12 @@ structured data. The following things are allowed in DFDL v1.0, but are difficult to map into most data models: -- anonymous choices -- duplicate element child names +- [anonymous choices](/dfdl-best-practices#avoidAnonymousChoices) +- [duplicate element child names](/dfdl-best-practices#AvoidChildElementsWithSameName) - namespaces that are different, but where the prefixes are not unique -- global names for element children + - There are numerous guidelines about namespaces and avoiding prefixes in the + [DFDL Schema Best Practices](/dfdl-best-practices) +- [global names for element children](/dfdl-best-practices#avoidElementNamespaces) A more restrictive subset of DFDL, a _standard profile_, is needed which can be enforced (on request) to ensure that DFDL schemas will be usable with a variety of data processing systems. @@ -55,15 +73,9 @@ standard profile (which is a subset of DFDL). # Standard Profile Restrictions -## No Anonymous Choices - -Choices must be the model groups of complex type definitions and are not allowed in any other -context. +## Group References Cannot Carry DFDL Properties {#groupReferencesCannotCarryDFDLProperties} -Each choice branch must begin with a different element. (This is already a XML Schema requirement - -Unique Particle Attribution.) - -## Group References Cannot Carry DFDL Properties +> **Note:** This is not mentioned in the best practices, but should be. Group references are allowed, but DFDL format properties cannot be expressed on group references; hence, combining those properties with those of the group definition is not required. @@ -82,7 +94,7 @@ Allowing groups and group references reduces the difficulty of converting many l schemas to conform to the standard profile, and makes this possible without introducing many otherwise unneeded element and type definitions. -## No Element References +## No Element References {#noElementReferences} There is no corresponding form of sharing in most data structure systems. @@ -97,7 +109,7 @@ All namespace prefixes must be unique in the entire schema. This enables one to create unique identifiers by concatenating prefix_local to create global names. -## All Element Children Have Unique Names +## All Element Children Have Unique Names {#allElementChildrenHaveUniqueNames} All children element declarations must have unique names within their enclosing parent element. @@ -228,7 +240,7 @@ it, requiring instead that an inner sequence carrying the assertion or discriminator with NO child content, be inserted in the sequence at the point where the evaluation is required to occur. - Requesting/Enabling the Standard Profile +# Requesting/Enabling the Standard Profile If the standard profile is requested, then use of constructs outside of the standard profile is a Schema Definition Error. @@ -281,4 +293,3 @@ Including such an explicitly non-standard-profile schema into a schema that requ profile should cause a Schema Definition Error. The inverse however, is not true. A schema that explicitly obeys the standard profile can be included/imported into any schema. - diff --git a/site/dfdl-best-practices.md b/site/dfdl-best-practices.md index a322e7f..567de9b 100644 --- a/site/dfdl-best-practices.md +++ b/site/dfdl-best-practices.md @@ -42,9 +42,22 @@ This page is a collection of notes on how to create DFDL schemas to obtain some using multiple different _XML Schema Validation libraries_ such as [Xerces C]( {{ site.data.links.reference.xercesc}}) and [libxml2]({{ site.data.links.reference.libxml2}}). -The [DFDL Training page lists several example schemas](/dfdl-training#exampleSchemas) which follow +The [DFDL Training page lists several example schemas](/dfdl-training#exampleSchemas) which follow this style guide fully which you can use as good starting points. +There are also best-practice materials on: +- [Slides on Well-Formed vs. Valid (Avoiding `dfdl:checkConstraints(.)`)]( +/best-practices/P-Avoid-Check-Constraints.pdf) +- [Slides on Handling large opaque BLOBs of binary data]( +/best-practices/P-DFDL-BLOBs-v-HexBinary-array.pdf) +- [Slides on Using _Reject Elements_ to capture bad data]( +/best-practices/P-DFDL-Reject-Elements.pdf) +- [Slides on Round-trip (parse + unparse) testing (with TDML)]( +/best-practices/P-DFDL-Round-Trip-Testing.pdf) +- [Slides on DFDL Schemas for ad-hoc structured text formats]( +/best-practices/P-DFDL-Structured-Text.pdf) + + This set of notes represents best practices after learning _the hard way_ from many debugging exercises and creating a wide variety of DFDL schemas from small teaching examples to large production schemas for major data formats with more than 100K lines of DFDL. @@ -62,7 +75,7 @@ that one might call _Strict Venetian-Blind Type Library_. Below are the details. -# Avoid Element Namespaces +# Avoid Element Namespaces {#avoidElementNamespaces} Much of the complexity of XML and XML Schema comes from their namespace features. This can be avoided entirely by following simple conventions. @@ -276,7 +289,7 @@ This is not quite as clean, but minimizes redundancy within what is allowed. Note that the DFDL Workgroup is considering adding the ability to [put DFDL properties on complex types]({{site.data.links.dfdlSpec.issue71}}) in a future version of the DFDL standard. -# Avoid Child Elements with the Same Name +# Avoid Child Elements with the Same Name {#AvoidChildElementsWithSameName} XML Schema has a data model with some flexibility needed only for markup languages. @@ -316,7 +329,7 @@ typical in structured data systems. JSON also has no notion of child elements with the same name, so avoiding this enables a DFDL schema to be JSON compatible. -# Avoid Anonymous Choices +# Avoid Anonymous Choices {#avoidAnonymousChoices} XML Schema allows a choice to be anonymous within the data model of an element. For example: ```xml <element name="myElement"> @@ -387,25 +400,27 @@ structures of the other data systems which do not allow anonymous choices. Given two different versions of a schema, consider: ```xml -<choice> - <element name="v1"> - <complexType> - <sequence> - <element name="a" .../> - <element name="c" type="xs:int" dfdl:length="7"/> - </sequence> - </complexType> - </element> - <element name="v2"> - <complexType> - <sequence> - <element name="b" .../> - <element name="c" type="xs:int" dfdl:length="6"/> - <element name="spare" type="xs:unsignedInt" dfdl:length="1"/> - </sequence> - </complexType> - </element> -</choice> +<complexType name="v1OrV2"> + <choice> + <element name="v1"> + <complexType> + <sequence> + <element name="a" .../> + <element name="c" type="xs:int" dfdl:length="7"/> + </sequence> + </complexType> + </element> + <element name="v2"> + <complexType> + <sequence> + <element name="b" .../> + <element name="c" type="xs:int" dfdl:length="6"/> + <element name="spare" type="xs:unsignedInt" dfdl:length="1"/> + </sequence> + </complexType> + </element> + </choice> +</complexType> ``` Note both versions 1 and 2 have a child named `c` which is an `xs:int`. @@ -415,19 +430,21 @@ The two differ only by a DFDL property (`dfdl:length`). Consider instead using this technique: ```xml -<choice> - <sequence> - <element name="v1" type="pre:empty"/> - <element name="a" .../> - <element name="c" type="xs:int" dfdl:length="7"/> - </sequence> - <sequence> - <element name="v2" type="pre:empty"/> - <element name="b" .../> - <element name="c" type="xs:int" dfdl:length="6"/> - <element name="spare" type="xs:unsignedInt" dfdl:length="1"/> - </sequence> -</choice> +<complexType name="v1OrV2"> + <choice> + <sequence> + <element name="v1" type="pre:empty"/> + <element name="a" .../> + <element name="c" type="xs:int" dfdl:length="7"/> + </sequence> + <sequence> + <element name="v2" type="pre:empty"/> + <element name="b" .../> + <element name="c" type="xs:int" dfdl:length="6"/> + <element name="spare" type="xs:unsignedInt" dfdl:length="1"/> + </sequence> + </choice> +</complexType> ``` This uses a marker element which will be `<v1/>` or `<v2/>` before the other elements. A path to the `c` element will not have a `v1` nor `v2` element parent. @@ -528,7 +545,7 @@ are small. > > ### About Spec Deltas > -> A deltas between two versions of a format specification document can be classified as one of +> A delta between two versions of a format specification document can be classified as one of > these kinds: > 1. Prose Correction: A clarification or correction to the text of the > document that improves it, > but does not represent any actual change to the data format. diff --git a/site/dfdl-extensions.md b/site/dfdl-extensions.md index c4d97fe..b152513 100644 --- a/site/dfdl-extensions.md +++ b/site/dfdl-extensions.md @@ -396,6 +396,15 @@ different reserved values since when unparsed, the constant string `Reserved` wi _canonicalized_ to integer 0. Putting data into canonical form when unparsing generally improves data security. +> **Best Practices Note:** Avoid whitespace of any kind in enumerated constant values. +> It is best to replace spaces by underscores ("_"). +> This avoids problems when the infoset, represented in XML, is pretty printed or otherwise +> formatted. +> Whitespace is generally fungible in XML, and a space could be turned into a line +> break by a variety of XML processing resulting in data that will +> not validate (as an XML document) nor unparse successfully. + + # Extended Behaviors for DFDL Types ## Type ``xs:hexBinary`` diff --git a/site/dfdl-training.md b/site/dfdl-training.md index 855d57d..d8c02df 100644 --- a/site/dfdl-training.md +++ b/site/dfdl-training.md @@ -202,6 +202,14 @@ showcasing: - Multi-version support - this schema handles both revisions C and D1 of the format simultaneously. +# Specific DFDL Properties +Short training slide decks or pages about specific properties. +- [DFDL `lengthKind`, `lengthUnits`, `bitOrder`, and `byteOrder` properties]( +/tutorials/P-DFDL-Properties-lengthKind-bitOrder.pdf) +- [DFDL Pad and Fill (`dfdl:fillByte`)]( +/tutorials/P-Filling-vs-Padding-Trimming.pdf) + + # Other Learning Resources There are a variety of other materials on the Internet that provide some DFDL training: diff --git a/site/tutorials/P-DFDL-Properties-lengthKind-bitOrder.pdf b/site/tutorials/P-DFDL-Properties-lengthKind-bitOrder.pdf new file mode 100755 index 0000000..45ce110 Binary files /dev/null and b/site/tutorials/P-DFDL-Properties-lengthKind-bitOrder.pdf differ diff --git a/site/tutorials/P-DFDL-Properties-lengthKind-bitOrder.pptx b/site/tutorials/P-DFDL-Properties-lengthKind-bitOrder.pptx new file mode 100755 index 0000000..716d536 Binary files /dev/null and b/site/tutorials/P-DFDL-Properties-lengthKind-bitOrder.pptx differ diff --git a/site/tutorials/P-Filling-vs-Padding-Trimming.pdf b/site/tutorials/P-Filling-vs-Padding-Trimming.pdf new file mode 100755 index 0000000..481d34e Binary files /dev/null and b/site/tutorials/P-Filling-vs-Padding-Trimming.pdf differ diff --git a/site/tutorials/P-Filling-vs-Padding-Trimming.pptx b/site/tutorials/P-Filling-vs-Padding-Trimming.pptx new file mode 100755 index 0000000..3402b2d Binary files /dev/null and b/site/tutorials/P-Filling-vs-Padding-Trimming.pptx differ
