mbeckerle commented on code in PR #121:
URL: https://github.com/apache/daffodil-site/pull/121#discussion_r1364690446
##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,133 @@
+# Proposal: DFDL Standard Profile
+
+#### Version 0.1 2023-10-13
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing
software, the need to make DFDL schemas interoperate properly in conjunction
with other data models has arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data
models which are powerful, but not as expressive as DFDL.
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even
this causes problems.
+Most other data processing systems were not designed with markup languages in
mind, but rather for structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can
be enforced (on request) to ensure that DFDL schemas will be usable with a
variety of data processing systems.
+Creating DFDL schemas that adhere to this standard profile ensures maximal
interoperability, including the ability to convert into JSON without
name/namespace collisions.
+
+This is a proposal for a switch/option to be added to Daffodil which turns on
enforcement of this standard profile, aka subset of DFDL.
Review Comment:
Adding CLI switch.
##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,133 @@
+# Proposal: DFDL Standard Profile
+
+#### Version 0.1 2023-10-13
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing
software, the need to make DFDL schemas interoperate properly in conjunction
with other data models has arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data
models which are powerful, but not as expressive as DFDL.
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even
this causes problems.
+Most other data processing systems were not designed with markup languages in
mind, but rather for structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can
be enforced (on request) to ensure that DFDL schemas will be usable with a
variety of data processing systems.
+Creating DFDL schemas that adhere to this standard profile ensures maximal
interoperability, including the ability to convert into JSON without
name/namespace collisions.
+
+This is a proposal for a switch/option to be added to Daffodil which turns on
enforcement of this standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices
+
+Choices must be the model groups of complex type definitions and are not
allowed in any other context.
+
+Each choice branch must begin with a different element. (This is already a XML
Schema requirement - Unique Particle Attribution.)
+
+### No Reusable Groups
+Group references and reusable group definitions are not allowed.
+
+### No Element References
+There is no corresponding form of sharing in most data structure systems.
+
+### No Namespace-Qualified Names
+Only elementFormDefault 'unqualified' is allowed.
+Note that this is the default for XML Schema and DFDL.
+
+### Unique Namespace Prefixes
+All namespace prefixes must be unique in the entire schema.
+
+This enables one to create unique identifiers by concatenating prefix_local to
create global names.
+
+### All Element Children Have Unique Names
+All children element declarations must have unique names within their
enclosing parent element
+
+### Nillable Simple Types Only
+
+Nillable is allowed only for simple type elements
+
+### Element Name/Identifier Restrictions
+Element names must consist of all non-whitespace characters from the
+Unicode basic multilingual plane (no surrogate pairs in element names).
+
+They may not contain any control characters (Uncode class Cc) may not contain
various punctuation characters (Uncode class Ps, Pe, Pd, Pc, Pf, Pi, Po, nor $).
+
+This is a lowest-common denominator of identifier rules intended to allow
+DFDL schema identifiers to be mapped into ANY programming language or
+structure declaration language, while at the same time allowing use of
+Unicode characters.
+
+Element names may not begin with a digit.
+
+Users are encourated to use "A-Za-Z0-9" only as some systems may not allow use
of unicode in identifiers.
+
+Element names may not begin with any prefix defined as part of the schema
+followed by an "_" as this could be ambiguous with names being made globally
+unique by appending prefix, "_" and local name.
+
+### String Content Restrictions
+
+Schemas may only be written in UTF-8 encoding.
+
+The DFDL property dfdl:utf16Width must be 'fixed'.
+
+### Import `schemaLocation`
+
+Imported files - a single unique file must be used when importing a namespace.
+A single schema may not contain different import statements for the same
+namespace but which specify different files.
+
+This is a practical requirement in Apache Daffodil today, but should be made
explicit.
+
+The `schemaLocation` - if it begins with a "/" it is interpreted as an
absolute path, otherwise a relative path. Both may be interpreted relative to a
classpath.
+
+## Existing DFDL Restrictions
+
+Just as a reminder, the above standard-profile restrictions go on top of DFDL
+existing limitations on XML Schema such as:
+
+- arrays/optional - only for elements
+- no mixed content
+- no complex type derivation
+- no attributes
+- limited set of simple types
+- pattern facets only for xs:string elements
+- other facet restrictions by type
+
+## Possible additional restrictions
+
+### Troublesome Placement of dfdl:assert and dfdl:discriminator on Sequences &
Choices
+
+The DFDL v1.0 rules about sequences/choices and statement annotations on them
are confusing.
+In particular, a dfdl:assert or dfdl:discriminator with testKind 'expression'
appears lexically at the top of the sequence/choice, but is executed after the
sequence/choice content has been parsed.
+
+This is sufficiently error-prone that the standard profile should disallow
+it, requiring instead that an inner sequence carrying the assertion or
+discriminator with NO child content, be inserted in the sequence at the
+point where the evaluation is required to occur.
+
+# Enabling the Standard Profile
+
+The following ways should be available for a schema author to tell Daffodil
they want enforcement of the standard profile (or not).
+- *Home Directory Properties File* - a file such as ~/.daffodil or
daffodil.properties should contain a default value for choosing the standard
profile. This default value can be overridden by other mechanisms which have
higher priority.
+- *Schema Project Properties File* - a Daffodil config file at the root of the
Daffodil schema project directory tree, should be able to specify a default
value for use by the schema project. If the XML config file system is
superceded by some other properties file per-schema, then that same file should
enable specifying the standard profile is (or is not) to be used.
+- *Per Schema-File property* - A property expessed at the top level of a DFDL
schema file should indicate that the standard profile is requested.
+ All schemas imported or included by a schema that requests the standard
profile would be assumed to also be required to obey the standard profile.
+
+An opposite expression - that a schema explicitly is known to require more
than the
+standard profile, should also be allowed to be placed in any of these
locations.
+Including such a non-standard-profile into a schema that requests the standard
+profile should cause an error.
Review Comment:
Many current schemas will NOT obey the standard profile unless modified
extensively. I think it will help if someone wants to plug the EDIFACT schema
into say, Apache Drill, that the EDIFACT schema explicitly tells Daffodil it
isn't standard profile, and so cannot be mapped directly. Such a user would
have to convert to JSON or XML first before using Drill on that data. Knowing
that the schema isn't even intended to be standard-profile compatble lets a
diagnostic suggest this to the user.
##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,133 @@
+# Proposal: DFDL Standard Profile
+
+#### Version 0.1 2023-10-13
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing
software, the need to make DFDL schemas interoperate properly in conjunction
with other data models has arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data
models which are powerful, but not as expressive as DFDL.
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even
this causes problems.
+Most other data processing systems were not designed with markup languages in
mind, but rather for structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can
be enforced (on request) to ensure that DFDL schemas will be usable with a
variety of data processing systems.
+Creating DFDL schemas that adhere to this standard profile ensures maximal
interoperability, including the ability to convert into JSON without
name/namespace collisions.
+
+This is a proposal for a switch/option to be added to Daffodil which turns on
enforcement of this standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices
+
+Choices must be the model groups of complex type definitions and are not
allowed in any other context.
+
+Each choice branch must begin with a different element. (This is already a XML
Schema requirement - Unique Particle Attribution.)
+
+### No Reusable Groups
+Group references and reusable group definitions are not allowed.
+
+### No Element References
+There is no corresponding form of sharing in most data structure systems.
+
+### No Namespace-Qualified Names
+Only elementFormDefault 'unqualified' is allowed.
+Note that this is the default for XML Schema and DFDL.
+
+### Unique Namespace Prefixes
+All namespace prefixes must be unique in the entire schema.
+
+This enables one to create unique identifiers by concatenating prefix_local to
create global names.
+
+### All Element Children Have Unique Names
+All children element declarations must have unique names within their
enclosing parent element
Review Comment:
Fixing
##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,133 @@
+# Proposal: DFDL Standard Profile
+
+#### Version 0.1 2023-10-13
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing
software, the need to make DFDL schemas interoperate properly in conjunction
with other data models has arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data
models which are powerful, but not as expressive as DFDL.
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even
this causes problems.
+Most other data processing systems were not designed with markup languages in
mind, but rather for structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can
be enforced (on request) to ensure that DFDL schemas will be usable with a
variety of data processing systems.
+Creating DFDL schemas that adhere to this standard profile ensures maximal
interoperability, including the ability to convert into JSON without
name/namespace collisions.
+
+This is a proposal for a switch/option to be added to Daffodil which turns on
enforcement of this standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices
+
+Choices must be the model groups of complex type definitions and are not
allowed in any other context.
+
+Each choice branch must begin with a different element. (This is already a XML
Schema requirement - Unique Particle Attribution.)
+
+### No Reusable Groups
+Group references and reusable group definitions are not allowed.
+
+### No Element References
+There is no corresponding form of sharing in most data structure systems.
+
+### No Namespace-Qualified Names
+Only elementFormDefault 'unqualified' is allowed.
+Note that this is the default for XML Schema and DFDL.
+
+### Unique Namespace Prefixes
+All namespace prefixes must be unique in the entire schema.
+
+This enables one to create unique identifiers by concatenating prefix_local to
create global names.
+
+### All Element Children Have Unique Names
+All children element declarations must have unique names within their
enclosing parent element
+
+### Nillable Simple Types Only
+
+Nillable is allowed only for simple type elements
Review Comment:
I will put in TBD here. I am not yet sure if Drill's data/metadata will
tolerate nillable maps (which correspond to complex types), or if that's a JSON
thing.
You are correct that JSON has no issues. A complex type element named x with
y, z children can be
* optional or required and present `{ "x":{ "y":5, "z":6 } }`
* optional and absent: `{ }`
* optional or required and present but null: `{ "x": null }`
So there's an easy way to represent nillable complex in json that is
arguably more natural than the nillable hack done for XML representations.
##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,133 @@
+# Proposal: DFDL Standard Profile
+
+#### Version 0.1 2023-10-13
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing
software, the need to make DFDL schemas interoperate properly in conjunction
with other data models has arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data
models which are powerful, but not as expressive as DFDL.
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even
this causes problems.
+Most other data processing systems were not designed with markup languages in
mind, but rather for structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can
be enforced (on request) to ensure that DFDL schemas will be usable with a
variety of data processing systems.
+Creating DFDL schemas that adhere to this standard profile ensures maximal
interoperability, including the ability to convert into JSON without
name/namespace collisions.
+
+This is a proposal for a switch/option to be added to Daffodil which turns on
enforcement of this standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices
+
+Choices must be the model groups of complex type definitions and are not
allowed in any other context.
+
+Each choice branch must begin with a different element. (This is already a XML
Schema requirement - Unique Particle Attribution.)
+
+### No Reusable Groups
+Group references and reusable group definitions are not allowed.
+
+### No Element References
+There is no corresponding form of sharing in most data structure systems.
+
+### No Namespace-Qualified Names
+Only elementFormDefault 'unqualified' is allowed.
+Note that this is the default for XML Schema and DFDL.
+
+### Unique Namespace Prefixes
+All namespace prefixes must be unique in the entire schema.
+
+This enables one to create unique identifiers by concatenating prefix_local to
create global names.
+
+### All Element Children Have Unique Names
+All children element declarations must have unique names within their
enclosing parent element
+
+### Nillable Simple Types Only
+
+Nillable is allowed only for simple type elements
+
+### Element Name/Identifier Restrictions
+Element names must consist of all non-whitespace characters from the
+Unicode basic multilingual plane (no surrogate pairs in element names).
+
+They may not contain any control characters (Uncode class Cc) may not contain
various punctuation characters (Uncode class Ps, Pe, Pd, Pc, Pf, Pi, Po, nor $).
+
+This is a lowest-common denominator of identifier rules intended to allow
+DFDL schema identifiers to be mapped into ANY programming language or
+structure declaration language, while at the same time allowing use of
+Unicode characters.
+
+Element names may not begin with a digit.
+
+Users are encourated to use "A-Za-Z0-9" only as some systems may not allow use
of unicode in identifiers.
+
+Element names may not begin with any prefix defined as part of the schema
+followed by an "_" as this could be ambiguous with names being made globally
+unique by appending prefix, "_" and local name.
+
+### String Content Restrictions
+
+Schemas may only be written in UTF-8 encoding.
+
+The DFDL property dfdl:utf16Width must be 'fixed'.
+
+### Import `schemaLocation`
+
+Imported files - a single unique file must be used when importing a namespace.
+A single schema may not contain different import statements for the same
+namespace but which specify different files.
+
+This is a practical requirement in Apache Daffodil today, but should be made
explicit.
+
+The `schemaLocation` - if it begins with a "/" it is interpreted as an
absolute path, otherwise a relative path. Both may be interpreted relative to a
classpath.
+
+## Existing DFDL Restrictions
+
+Just as a reminder, the above standard-profile restrictions go on top of DFDL
+existing limitations on XML Schema such as:
+
+- arrays/optional - only for elements
+- no mixed content
+- no complex type derivation
+- no attributes
+- limited set of simple types
+- pattern facets only for xs:string elements
+- other facet restrictions by type
+
+## Possible additional restrictions
+
+### Troublesome Placement of dfdl:assert and dfdl:discriminator on Sequences &
Choices
+
+The DFDL v1.0 rules about sequences/choices and statement annotations on them
are confusing.
+In particular, a dfdl:assert or dfdl:discriminator with testKind 'expression'
appears lexically at the top of the sequence/choice, but is executed after the
sequence/choice content has been parsed.
+
+This is sufficiently error-prone that the standard profile should disallow
+it, requiring instead that an inner sequence carrying the assertion or
+discriminator with NO child content, be inserted in the sequence at the
+point where the evaluation is required to occur.
+
+# Enabling the Standard Profile
+
+The following ways should be available for a schema author to tell Daffodil
they want enforcement of the standard profile (or not).
+- *Home Directory Properties File* - a file such as ~/.daffodil or
daffodil.properties should contain a default value for choosing the standard
profile. This default value can be overridden by other mechanisms which have
higher priority.
+- *Schema Project Properties File* - a Daffodil config file at the root of the
Daffodil schema project directory tree, should be able to specify a default
value for use by the schema project. If the XML config file system is
superceded by some other properties file per-schema, then that same file should
enable specifying the standard profile is (or is not) to be used.
+- *Per Schema-File property* - A property expessed at the top level of a DFDL
schema file should indicate that the standard profile is requested.
+ All schemas imported or included by a schema that requests the standard
profile would be assumed to also be required to obey the standard profile.
Review Comment:
Adding.
##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,133 @@
+# Proposal: DFDL Standard Profile
+
+#### Version 0.1 2023-10-13
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing
software, the need to make DFDL schemas interoperate properly in conjunction
with other data models has arisen.
Review Comment:
Fixing
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]