tuxji commented on code in PR #121: URL: https://github.com/apache/daffodil-site/pull/121#discussion_r1359951936
########## site/dev/design-notes/Proposed-DFDL-Standard-Profile.md: ########## @@ -0,0 +1,133 @@ +# Proposal: DFDL Standard Profile + +#### Version 0.1 2023-10-13 + +## Introduction + +In attempting to integrate Apache Daffodil with other data processing software, the need to make DFDL schemas interoperate properly in conjunction with other data models has arisen. + +Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data models which are powerful, but not as expressive as DFDL. + +DFDL's data model is a simplification of XML Schemas's PSVI; however, even this causes problems. +Most other data processing systems were not designed with markup languages in mind, but rather for structured data. + +The following things are allowed in DFDL v1.0, but are difficult to map into most data models: + +- anonymous choices +- duplicate element child names +- namespaces that are different, but where the prefixes are not unique +- global names for element children + +A more restrictive subset of DFDL, a _standard profile_, is needed which can be enforced (on request) to ensure that DFDL schemas will be usable with a variety of data processing systems. +Creating DFDL schemas that adhere to this standard profile ensures maximal interoperability, including the ability to convert into JSON without name/namespace collisions. + +This is a proposal for a switch/option to be added to Daffodil which turns on enforcement of this standard profile, aka subset of DFDL. Review Comment: I agree with the idea of modifying Daffodil to validate (only upon request using an optional config/switch/option) that a DFDL schema uses a more restrictive subset of DFDL. Further below, I see optional config files proposed, but I don't see a command line switch/option proposed as well. I'd want the CLI to have such an option too. ########## site/dev/design-notes/Proposed-DFDL-Standard-Profile.md: ########## @@ -0,0 +1,133 @@ +# Proposal: DFDL Standard Profile + +#### Version 0.1 2023-10-13 + +## Introduction + +In attempting to integrate Apache Daffodil with other data processing software, the need to make DFDL schemas interoperate properly in conjunction with other data models has arisen. + +Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data models which are powerful, but not as expressive as DFDL. + +DFDL's data model is a simplification of XML Schemas's PSVI; however, even this causes problems. +Most other data processing systems were not designed with markup languages in mind, but rather for structured data. + +The following things are allowed in DFDL v1.0, but are difficult to map into most data models: + +- anonymous choices +- duplicate element child names +- namespaces that are different, but where the prefixes are not unique +- global names for element children + +A more restrictive subset of DFDL, a _standard profile_, is needed which can be enforced (on request) to ensure that DFDL schemas will be usable with a variety of data processing systems. +Creating DFDL schemas that adhere to this standard profile ensures maximal interoperability, including the ability to convert into JSON without name/namespace collisions. + +This is a proposal for a switch/option to be added to Daffodil which turns on enforcement of this standard profile, aka subset of DFDL. + +## Standard Profile Restrictions + +### No Anonymous Choices + +Choices must be the model groups of complex type definitions and are not allowed in any other context. + +Each choice branch must begin with a different element. (This is already a XML Schema requirement - Unique Particle Attribution.) + +### No Reusable Groups +Group references and reusable group definitions are not allowed. + +### No Element References +There is no corresponding form of sharing in most data structure systems. + +### No Namespace-Qualified Names +Only elementFormDefault 'unqualified' is allowed. +Note that this is the default for XML Schema and DFDL. + +### Unique Namespace Prefixes +All namespace prefixes must be unique in the entire schema. + +This enables one to create unique identifiers by concatenating prefix_local to create global names. + +### All Element Children Have Unique Names +All children element declarations must have unique names within their enclosing parent element + +### Nillable Simple Types Only + +Nillable is allowed only for simple type elements + +### Element Name/Identifier Restrictions +Element names must consist of all non-whitespace characters from the +Unicode basic multilingual plane (no surrogate pairs in element names). + +They may not contain any control characters (Uncode class Cc) may not contain various punctuation characters (Uncode class Ps, Pe, Pd, Pc, Pf, Pi, Po, nor $). + +This is a lowest-common denominator of identifier rules intended to allow +DFDL schema identifiers to be mapped into ANY programming language or +structure declaration language, while at the same time allowing use of +Unicode characters. + +Element names may not begin with a digit. + +Users are encourated to use "A-Za-Z0-9" only as some systems may not allow use of unicode in identifiers. + +Element names may not begin with any prefix defined as part of the schema +followed by an "_" as this could be ambiguous with names being made globally +unique by appending prefix, "_" and local name. + +### String Content Restrictions + +Schemas may only be written in UTF-8 encoding. + +The DFDL property dfdl:utf16Width must be 'fixed'. + +### Import `schemaLocation` + +Imported files - a single unique file must be used when importing a namespace. +A single schema may not contain different import statements for the same +namespace but which specify different files. + +This is a practical requirement in Apache Daffodil today, but should be made explicit. + +The `schemaLocation` - if it begins with a "/" it is interpreted as an absolute path, otherwise a relative path. Both may be interpreted relative to a classpath. + +## Existing DFDL Restrictions + +Just as a reminder, the above standard-profile restrictions go on top of DFDL +existing limitations on XML Schema such as: + +- arrays/optional - only for elements +- no mixed content +- no complex type derivation +- no attributes +- limited set of simple types +- pattern facets only for xs:string elements +- other facet restrictions by type + +## Possible additional restrictions + +### Troublesome Placement of dfdl:assert and dfdl:discriminator on Sequences & Choices + +The DFDL v1.0 rules about sequences/choices and statement annotations on them are confusing. +In particular, a dfdl:assert or dfdl:discriminator with testKind 'expression' appears lexically at the top of the sequence/choice, but is executed after the sequence/choice content has been parsed. + +This is sufficiently error-prone that the standard profile should disallow +it, requiring instead that an inner sequence carrying the assertion or +discriminator with NO child content, be inserted in the sequence at the +point where the evaluation is required to occur. + +# Enabling the Standard Profile + +The following ways should be available for a schema author to tell Daffodil they want enforcement of the standard profile (or not). +- *Home Directory Properties File* - a file such as ~/.daffodil or daffodil.properties should contain a default value for choosing the standard profile. This default value can be overridden by other mechanisms which have higher priority. +- *Schema Project Properties File* - a Daffodil config file at the root of the Daffodil schema project directory tree, should be able to specify a default value for use by the schema project. If the XML config file system is superceded by some other properties file per-schema, then that same file should enable specifying the standard profile is (or is not) to be used. +- *Per Schema-File property* - A property expessed at the top level of a DFDL schema file should indicate that the standard profile is requested. + All schemas imported or included by a schema that requests the standard profile would be assumed to also be required to obey the standard profile. + +An opposite expression - that a schema explicitly is known to require more than the +standard profile, should also be allowed to be placed in any of these locations. +Including such a non-standard-profile into a schema that requests the standard +profile should cause an error. Review Comment: Why bother with this option, unless you think Daffodil should check (with enforcement/erroring capability) that a schema really and truly uses something outside the standard profile too? ########## site/dev/design-notes/Proposed-DFDL-Standard-Profile.md: ########## @@ -0,0 +1,133 @@ +# Proposal: DFDL Standard Profile + +#### Version 0.1 2023-10-13 + +## Introduction + +In attempting to integrate Apache Daffodil with other data processing software, the need to make DFDL schemas interoperate properly in conjunction with other data models has arisen. Review Comment: Document looks good overall, although I would wrap lines after 80 or 100 columns (your choice). ########## site/dev/design-notes/Proposed-DFDL-Standard-Profile.md: ########## @@ -0,0 +1,133 @@ +# Proposal: DFDL Standard Profile + +#### Version 0.1 2023-10-13 + +## Introduction + +In attempting to integrate Apache Daffodil with other data processing software, the need to make DFDL schemas interoperate properly in conjunction with other data models has arisen. + +Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data models which are powerful, but not as expressive as DFDL. + +DFDL's data model is a simplification of XML Schemas's PSVI; however, even this causes problems. +Most other data processing systems were not designed with markup languages in mind, but rather for structured data. + +The following things are allowed in DFDL v1.0, but are difficult to map into most data models: + +- anonymous choices +- duplicate element child names +- namespaces that are different, but where the prefixes are not unique +- global names for element children + +A more restrictive subset of DFDL, a _standard profile_, is needed which can be enforced (on request) to ensure that DFDL schemas will be usable with a variety of data processing systems. +Creating DFDL schemas that adhere to this standard profile ensures maximal interoperability, including the ability to convert into JSON without name/namespace collisions. + +This is a proposal for a switch/option to be added to Daffodil which turns on enforcement of this standard profile, aka subset of DFDL. + +## Standard Profile Restrictions + +### No Anonymous Choices + +Choices must be the model groups of complex type definitions and are not allowed in any other context. + +Each choice branch must begin with a different element. (This is already a XML Schema requirement - Unique Particle Attribution.) + +### No Reusable Groups +Group references and reusable group definitions are not allowed. + +### No Element References +There is no corresponding form of sharing in most data structure systems. + +### No Namespace-Qualified Names +Only elementFormDefault 'unqualified' is allowed. +Note that this is the default for XML Schema and DFDL. + +### Unique Namespace Prefixes +All namespace prefixes must be unique in the entire schema. + +This enables one to create unique identifiers by concatenating prefix_local to create global names. + +### All Element Children Have Unique Names +All children element declarations must have unique names within their enclosing parent element + +### Nillable Simple Types Only + +Nillable is allowed only for simple type elements + +### Element Name/Identifier Restrictions +Element names must consist of all non-whitespace characters from the +Unicode basic multilingual plane (no surrogate pairs in element names). + +They may not contain any control characters (Uncode class Cc) may not contain various punctuation characters (Uncode class Ps, Pe, Pd, Pc, Pf, Pi, Po, nor $). + +This is a lowest-common denominator of identifier rules intended to allow +DFDL schema identifiers to be mapped into ANY programming language or +structure declaration language, while at the same time allowing use of +Unicode characters. + +Element names may not begin with a digit. + +Users are encourated to use "A-Za-Z0-9" only as some systems may not allow use of unicode in identifiers. + +Element names may not begin with any prefix defined as part of the schema +followed by an "_" as this could be ambiguous with names being made globally +unique by appending prefix, "_" and local name. + +### String Content Restrictions + +Schemas may only be written in UTF-8 encoding. + +The DFDL property dfdl:utf16Width must be 'fixed'. + +### Import `schemaLocation` + +Imported files - a single unique file must be used when importing a namespace. +A single schema may not contain different import statements for the same +namespace but which specify different files. + +This is a practical requirement in Apache Daffodil today, but should be made explicit. + +The `schemaLocation` - if it begins with a "/" it is interpreted as an absolute path, otherwise a relative path. Both may be interpreted relative to a classpath. + +## Existing DFDL Restrictions + +Just as a reminder, the above standard-profile restrictions go on top of DFDL +existing limitations on XML Schema such as: + +- arrays/optional - only for elements +- no mixed content +- no complex type derivation +- no attributes +- limited set of simple types +- pattern facets only for xs:string elements +- other facet restrictions by type + +## Possible additional restrictions + +### Troublesome Placement of dfdl:assert and dfdl:discriminator on Sequences & Choices + +The DFDL v1.0 rules about sequences/choices and statement annotations on them are confusing. +In particular, a dfdl:assert or dfdl:discriminator with testKind 'expression' appears lexically at the top of the sequence/choice, but is executed after the sequence/choice content has been parsed. + +This is sufficiently error-prone that the standard profile should disallow +it, requiring instead that an inner sequence carrying the assertion or +discriminator with NO child content, be inserted in the sequence at the +point where the evaluation is required to occur. + +# Enabling the Standard Profile + +The following ways should be available for a schema author to tell Daffodil they want enforcement of the standard profile (or not). +- *Home Directory Properties File* - a file such as ~/.daffodil or daffodil.properties should contain a default value for choosing the standard profile. This default value can be overridden by other mechanisms which have higher priority. +- *Schema Project Properties File* - a Daffodil config file at the root of the Daffodil schema project directory tree, should be able to specify a default value for use by the schema project. If the XML config file system is superceded by some other properties file per-schema, then that same file should enable specifying the standard profile is (or is not) to be used. +- *Per Schema-File property* - A property expessed at the top level of a DFDL schema file should indicate that the standard profile is requested. + All schemas imported or included by a schema that requests the standard profile would be assumed to also be required to obey the standard profile. Review Comment: A command line switch would be desirable too. ########## site/dev/design-notes/Proposed-DFDL-Standard-Profile.md: ########## @@ -0,0 +1,133 @@ +# Proposal: DFDL Standard Profile + +#### Version 0.1 2023-10-13 + +## Introduction + +In attempting to integrate Apache Daffodil with other data processing software, the need to make DFDL schemas interoperate properly in conjunction with other data models has arisen. + +Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data models which are powerful, but not as expressive as DFDL. + +DFDL's data model is a simplification of XML Schemas's PSVI; however, even this causes problems. +Most other data processing systems were not designed with markup languages in mind, but rather for structured data. + +The following things are allowed in DFDL v1.0, but are difficult to map into most data models: + +- anonymous choices +- duplicate element child names +- namespaces that are different, but where the prefixes are not unique +- global names for element children + +A more restrictive subset of DFDL, a _standard profile_, is needed which can be enforced (on request) to ensure that DFDL schemas will be usable with a variety of data processing systems. +Creating DFDL schemas that adhere to this standard profile ensures maximal interoperability, including the ability to convert into JSON without name/namespace collisions. + +This is a proposal for a switch/option to be added to Daffodil which turns on enforcement of this standard profile, aka subset of DFDL. + +## Standard Profile Restrictions + +### No Anonymous Choices + +Choices must be the model groups of complex type definitions and are not allowed in any other context. + +Each choice branch must begin with a different element. (This is already a XML Schema requirement - Unique Particle Attribution.) + +### No Reusable Groups +Group references and reusable group definitions are not allowed. + +### No Element References +There is no corresponding form of sharing in most data structure systems. + +### No Namespace-Qualified Names +Only elementFormDefault 'unqualified' is allowed. +Note that this is the default for XML Schema and DFDL. + +### Unique Namespace Prefixes +All namespace prefixes must be unique in the entire schema. + +This enables one to create unique identifiers by concatenating prefix_local to create global names. + +### All Element Children Have Unique Names +All children element declarations must have unique names within their enclosing parent element Review Comment: Good markdown style puts blank lines between ### header lines and text lines, wraps text lines to 80/100 colums, and ends sentences with periods. ########## site/dev/design-notes/Proposed-DFDL-Standard-Profile.md: ########## @@ -0,0 +1,133 @@ +# Proposal: DFDL Standard Profile + +#### Version 0.1 2023-10-13 + +## Introduction + +In attempting to integrate Apache Daffodil with other data processing software, the need to make DFDL schemas interoperate properly in conjunction with other data models has arisen. + +Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data models which are powerful, but not as expressive as DFDL. + +DFDL's data model is a simplification of XML Schemas's PSVI; however, even this causes problems. +Most other data processing systems were not designed with markup languages in mind, but rather for structured data. + +The following things are allowed in DFDL v1.0, but are difficult to map into most data models: + +- anonymous choices +- duplicate element child names +- namespaces that are different, but where the prefixes are not unique +- global names for element children + +A more restrictive subset of DFDL, a _standard profile_, is needed which can be enforced (on request) to ensure that DFDL schemas will be usable with a variety of data processing systems. +Creating DFDL schemas that adhere to this standard profile ensures maximal interoperability, including the ability to convert into JSON without name/namespace collisions. + +This is a proposal for a switch/option to be added to Daffodil which turns on enforcement of this standard profile, aka subset of DFDL. + +## Standard Profile Restrictions + +### No Anonymous Choices + +Choices must be the model groups of complex type definitions and are not allowed in any other context. + +Each choice branch must begin with a different element. (This is already a XML Schema requirement - Unique Particle Attribution.) + +### No Reusable Groups +Group references and reusable group definitions are not allowed. + +### No Element References +There is no corresponding form of sharing in most data structure systems. + +### No Namespace-Qualified Names +Only elementFormDefault 'unqualified' is allowed. +Note that this is the default for XML Schema and DFDL. + +### Unique Namespace Prefixes +All namespace prefixes must be unique in the entire schema. + +This enables one to create unique identifiers by concatenating prefix_local to create global names. + +### All Element Children Have Unique Names +All children element declarations must have unique names within their enclosing parent element + +### Nillable Simple Types Only + +Nillable is allowed only for simple type elements Review Comment: Please explain why this restriction helps Drill or JSON. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
