tuxji commented on code in PR #121:
URL: https://github.com/apache/daffodil-site/pull/121#discussion_r1419719312


##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,214 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.2  2023-10-18

Review Comment:
   Should you update the date since you changed the file today?



##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,214 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.2  2023-10-18
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make
+DFDL schemas interoperate properly in conjunction with other data models has 
arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data 
models which are
+powerful, but not as expressive as DFDL.
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even 
this causes problems.

Review Comment:
   I had to google to find out that PSVI means "post schema validation 
infoset".  I suggest you use these words, add PSVI in parentheses following it, 
and hyperlink PSVI to https://www.w3.org/XML/2002/05/psvi-use-cases which gives 
an idea of what PSVI means.



##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,214 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.2  2023-10-18
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make
+DFDL schemas interoperate properly in conjunction with other data models has 
arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data 
models which are
+powerful, but not as expressive as DFDL.
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even 
this causes problems.
+Most other data processing systems were not designed with markup languages in 
mind, but rather for
+structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into 
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can 
be enforced (on
+request) to ensure that DFDL schemas will be usable with a variety of data 
processing systems.
+Creating DFDL schemas that adhere to this standard profile ensures maximal 
interoperability,
+including the ability to convert into JSON without name/namespace collisions.
+
+This is a proposal for a switch/option to be added to Daffodil which turns on 
enforcement of this
+standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices
+
+Choices must be the model groups of complex type definitions and are not 
allowed in any other
+context.
+
+Each choice branch must begin with a different element. (This is already a XML 
Schema requirement -
+Unique Particle Attribution.)
+
+### Group References Cannot Carry DFDL Properties
+
+Group references are allowed, but DFDL properties cannot be expressed on group 
references; hence,
+combining those properties with those of the group definition is not required.
+
+While most data structure systems do not have this notion of reusable groups, 
when restricted as 
+described, reusable groups are something users could implement by way of a 
simple macro 
+pre-processor, so having this in the standard profile really does not create 
any particular 
+challenge when mapping from DFDL standard profile schemas into any data 
structure system. 
+Groups and group references are used heavily in DFDL schemas to push down 
complexity like 
+discriminators that are reused in many places.

Review Comment:
   I became concerned when I saw discriminators mentioned since you had just 
said DFDL properties cannot be expressed on group references.  You may want to 
add a clarification that discriminators are DFDL statements and therefore 
allowed in groups but not allowed in group references, or whatever it is you 
meant when you mentioned discriminators.



##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,214 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.2  2023-10-18
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make
+DFDL schemas interoperate properly in conjunction with other data models has 
arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data 
models which are
+powerful, but not as expressive as DFDL.
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even 
this causes problems.
+Most other data processing systems were not designed with markup languages in 
mind, but rather for
+structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into 
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can 
be enforced (on
+request) to ensure that DFDL schemas will be usable with a variety of data 
processing systems.
+Creating DFDL schemas that adhere to this standard profile ensures maximal 
interoperability,
+including the ability to convert into JSON without name/namespace collisions.
+
+This is a proposal for a switch/option to be added to Daffodil which turns on 
enforcement of this
+standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices
+
+Choices must be the model groups of complex type definitions and are not 
allowed in any other
+context.
+
+Each choice branch must begin with a different element. (This is already a XML 
Schema requirement -
+Unique Particle Attribution.)
+
+### Group References Cannot Carry DFDL Properties
+
+Group references are allowed, but DFDL properties cannot be expressed on group 
references; hence,
+combining those properties with those of the group definition is not required.
+
+While most data structure systems do not have this notion of reusable groups, 
when restricted as 
+described, reusable groups are something users could implement by way of a 
simple macro 
+pre-processor, so having this in the standard profile really does not create 
any particular 
+challenge when mapping from DFDL standard profile schemas into any data 
structure system. 
+Groups and group references are used heavily in DFDL schemas to push down 
complexity like 
+discriminators that are reused in many places.
+Allowing groups and group references reduces the difficulty of converting many 
large DFDL 
+schemas to conform to the standard profile. 
+
+### No Element References
+
+There is no corresponding form of sharing in most data structure systems.
+
+### No Namespace-Qualified Names
+
+Only elementFormDefault 'unqualified' is allowed.
+Note that this is the default for XML Schema and DFDL.
+
+### Unique Namespace Prefixes
+
+All namespace prefixes must be unique in the entire schema.
+
+This enables one to create unique identifiers by concatenating prefix_local to 
create global names.
+
+### All Element Children Have Unique Names
+
+All children element declarations must have unique names within their 
enclosing parent element.
+
+#### Discussion
+Note that this causes issues in a number of large DFDL schemas (e.g, VMF) 
which attempt to implement a 
+single DFDL schema that is capable of handling multiple versions of the data 
format.
+
+In this case, the schema uses a construct like:
+```xml
+<choice>

Review Comment:
   Does this snippet omit any attributes which tell Daffodil which choice 
branch to take, or is the schema relying on Daffodil getting a parse error in 
hdr_version_C_type, backtracking to the choice, and parsing hdr_version_D_type 
instead?  You may want to clarify which is the case in the sentence above 
introducing the snippet.



##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,214 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.2  2023-10-18
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make
+DFDL schemas interoperate properly in conjunction with other data models has 
arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data 
models which are
+powerful, but not as expressive as DFDL.
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even 
this causes problems.
+Most other data processing systems were not designed with markup languages in 
mind, but rather for
+structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into 
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can 
be enforced (on
+request) to ensure that DFDL schemas will be usable with a variety of data 
processing systems.
+Creating DFDL schemas that adhere to this standard profile ensures maximal 
interoperability,
+including the ability to convert into JSON without name/namespace collisions.
+
+This is a proposal for a switch/option to be added to Daffodil which turns on 
enforcement of this
+standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices
+
+Choices must be the model groups of complex type definitions and are not 
allowed in any other
+context.
+
+Each choice branch must begin with a different element. (This is already a XML 
Schema requirement -
+Unique Particle Attribution.)
+
+### Group References Cannot Carry DFDL Properties
+
+Group references are allowed, but DFDL properties cannot be expressed on group 
references; hence,
+combining those properties with those of the group definition is not required.
+
+While most data structure systems do not have this notion of reusable groups, 
when restricted as 
+described, reusable groups are something users could implement by way of a 
simple macro 
+pre-processor, so having this in the standard profile really does not create 
any particular 
+challenge when mapping from DFDL standard profile schemas into any data 
structure system. 
+Groups and group references are used heavily in DFDL schemas to push down 
complexity like 
+discriminators that are reused in many places.
+Allowing groups and group references reduces the difficulty of converting many 
large DFDL 
+schemas to conform to the standard profile. 
+
+### No Element References
+
+There is no corresponding form of sharing in most data structure systems.
+
+### No Namespace-Qualified Names
+
+Only elementFormDefault 'unqualified' is allowed.
+Note that this is the default for XML Schema and DFDL.
+
+### Unique Namespace Prefixes
+
+All namespace prefixes must be unique in the entire schema.
+
+This enables one to create unique identifiers by concatenating prefix_local to 
create global names.
+
+### All Element Children Have Unique Names
+
+All children element declarations must have unique names within their 
enclosing parent element.
+
+#### Discussion
+Note that this causes issues in a number of large DFDL schemas (e.g, VMF) 
which attempt to implement a 
+single DFDL schema that is capable of handling multiple versions of the data 
format.
+
+In this case, the schema uses a construct like:
+```xml
+<choice>
+  <sequence>
+    <element name="C" type="zString"/>
+    <element name="hdr" type="hdr_version_C_type"/>
+  </sequence>
+  <sequence>
+    <element name="D" type="zString"/>
+    <element name="hdr" type="hdr_version_D_type"/>
+  </sequence>
+</choice>
+```
+In the above, you can see that there are two separate element declarations 
named hdr, of different types. 
+This allows common sub-structure that is the same in versions C and D to be 
addressed by path expressions that are
+polymorphic. They do not have a path step component that identifies the 
version. 
+
+However, if we require all children to have unique names, then this would have 
to be elements with distinct names on each
+branch such as hdrC and hdrD, and then paths, even those reaching sub-fields 
that are common to both versions, 
+have path steps that are specifically requesting a particular version. 
+
+This is a bit painful particularly if there are many expressions that need to 
reference into the common fields, because
+all such expressions would need to be duplicated for version C and version D. 
+
+An alternative solution is that this could be overcome by a way of creating 
path expressions with wildcards in 
+them eg., ".../hdr*/...".
+An extension of this kind in DFDL has already been proposed/discussed some 
time ago by the DFDL workgroup, but has 
+not yet turned into a formal proposal. (The DFDL4Space implementation by the 
ESA has a kind of wildcard feature like
+this as a DFDL extension.)
+
+Another way of addressing this is to put the version distinction at precisely 
each point of difference between the 
+schema versions. 
+This is, however, not the way some large schemas were created, as these 
schemas are machine generated from the 
+individual format specifications. The generator is not aware of the individual 
differences between the versions,
+but only that there are *some* differences between them. 
+It requires a more sophisticated schema generator to compute these fine-level 
diffs between the two schemas. 
+
+
+### Nillable Simple Types Only (TBD: May not be necessary)
+
+Nillable is allowed only for simple type elements. 
+
+### Element Name/Identifier Restrictions
+
+Element names must consist of all non-whitespace characters from the
+Unicode basic multilingual plane (no surrogate pairs in element names).
+
+They may not contain any control characters (Uncode class Cc) may not contain 
various punctuation
+characters (Uncode class Ps, Pe, Pd, Pc, Pf, Pi, Po, nor $).
+
+This is a lowest-common denominator of identifier rules intended to allow
+DFDL schema identifiers to be mapped into ANY programming language or
+structure declaration language, while at the same time allowing use of
+Unicode characters.
+
+Element names may not begin with a digit.
+
+Users are encourated to use "A-Za-Z0-9" only as some systems may not allow use 
of unicode in
+identifiers.
+
+Element names may not begin with any prefix defined as part of the schema
+followed by an "_" as this could be ambiguous with names being made globally
+unique by appending prefix, "_" and local name.
+
+### String Content Restrictions
+
+Schemas may only be written in UTF-8 encoding.
+
+The DFDL property dfdl:utf16Width must be 'fixed'.
+
+### Import `schemaLocation`
+
+Imported files - a single unique file must be used when importing a namespace.
+A single schema may not contain different import statements for the same
+namespace but which specify different files.
+
+This is a practical requirement in Apache Daffodil today, but should be made 
explicit.
+
+The `schemaLocation` - if it begins with a "/" it is interpreted as an 
absolute path, otherwise a

Review Comment:
   *If the `schemaLocation` begins with...



##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,214 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.2  2023-10-18
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make
+DFDL schemas interoperate properly in conjunction with other data models has 
arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data 
models which are
+powerful, but not as expressive as DFDL.
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even 
this causes problems.
+Most other data processing systems were not designed with markup languages in 
mind, but rather for
+structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into 
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can 
be enforced (on
+request) to ensure that DFDL schemas will be usable with a variety of data 
processing systems.
+Creating DFDL schemas that adhere to this standard profile ensures maximal 
interoperability,
+including the ability to convert into JSON without name/namespace collisions.
+
+This is a proposal for a switch/option to be added to Daffodil which turns on 
enforcement of this
+standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices
+
+Choices must be the model groups of complex type definitions and are not 
allowed in any other
+context.
+
+Each choice branch must begin with a different element. (This is already a XML 
Schema requirement -
+Unique Particle Attribution.)
+
+### Group References Cannot Carry DFDL Properties
+
+Group references are allowed, but DFDL properties cannot be expressed on group 
references; hence,
+combining those properties with those of the group definition is not required.
+
+While most data structure systems do not have this notion of reusable groups, 
when restricted as 
+described, reusable groups are something users could implement by way of a 
simple macro 
+pre-processor, so having this in the standard profile really does not create 
any particular 
+challenge when mapping from DFDL standard profile schemas into any data 
structure system. 
+Groups and group references are used heavily in DFDL schemas to push down 
complexity like 
+discriminators that are reused in many places.
+Allowing groups and group references reduces the difficulty of converting many 
large DFDL 
+schemas to conform to the standard profile. 
+
+### No Element References
+
+There is no corresponding form of sharing in most data structure systems.
+
+### No Namespace-Qualified Names
+
+Only elementFormDefault 'unqualified' is allowed.
+Note that this is the default for XML Schema and DFDL.
+
+### Unique Namespace Prefixes
+
+All namespace prefixes must be unique in the entire schema.
+
+This enables one to create unique identifiers by concatenating prefix_local to 
create global names.
+
+### All Element Children Have Unique Names
+
+All children element declarations must have unique names within their 
enclosing parent element.
+
+#### Discussion
+Note that this causes issues in a number of large DFDL schemas (e.g, VMF) 
which attempt to implement a 
+single DFDL schema that is capable of handling multiple versions of the data 
format.
+
+In this case, the schema uses a construct like:
+```xml
+<choice>
+  <sequence>
+    <element name="C" type="zString"/>
+    <element name="hdr" type="hdr_version_C_type"/>
+  </sequence>
+  <sequence>
+    <element name="D" type="zString"/>
+    <element name="hdr" type="hdr_version_D_type"/>
+  </sequence>
+</choice>
+```
+In the above, you can see that there are two separate element declarations 
named hdr, of different types. 
+This allows common sub-structure that is the same in versions C and D to be 
addressed by path expressions that are
+polymorphic. They do not have a path step component that identifies the 
version. 
+
+However, if we require all children to have unique names, then this would have 
to be elements with distinct names on each
+branch such as hdrC and hdrD, and then paths, even those reaching sub-fields 
that are common to both versions, 
+have path steps that are specifically requesting a particular version. 
+
+This is a bit painful particularly if there are many expressions that need to 
reference into the common fields, because
+all such expressions would need to be duplicated for version C and version D. 
+
+An alternative solution is that this could be overcome by a way of creating 
path expressions with wildcards in 
+them eg., ".../hdr*/...".
+An extension of this kind in DFDL has already been proposed/discussed some 
time ago by the DFDL workgroup, but has 
+not yet turned into a formal proposal. (The DFDL4Space implementation by the 
ESA has a kind of wildcard feature like
+this as a DFDL extension.)
+
+Another way of addressing this is to put the version distinction at precisely 
each point of difference between the 
+schema versions. 
+This is, however, not the way some large schemas were created, as these 
schemas are machine generated from the 
+individual format specifications. The generator is not aware of the individual 
differences between the versions,
+but only that there are *some* differences between them. 
+It requires a more sophisticated schema generator to compute these fine-level 
diffs between the two schemas. 
+
+
+### Nillable Simple Types Only (TBD: May not be necessary)
+
+Nillable is allowed only for simple type elements. 
+
+### Element Name/Identifier Restrictions
+
+Element names must consist of all non-whitespace characters from the
+Unicode basic multilingual plane (no surrogate pairs in element names).
+
+They may not contain any control characters (Uncode class Cc) may not contain 
various punctuation
+characters (Uncode class Ps, Pe, Pd, Pc, Pf, Pi, Po, nor $).
+
+This is a lowest-common denominator of identifier rules intended to allow
+DFDL schema identifiers to be mapped into ANY programming language or
+structure declaration language, while at the same time allowing use of
+Unicode characters.
+
+Element names may not begin with a digit.
+
+Users are encourated to use "A-Za-Z0-9" only as some systems may not allow use 
of unicode in
+identifiers.
+
+Element names may not begin with any prefix defined as part of the schema
+followed by an "_" as this could be ambiguous with names being made globally
+unique by appending prefix, "_" and local name.
+
+### String Content Restrictions
+
+Schemas may only be written in UTF-8 encoding.
+
+The DFDL property dfdl:utf16Width must be 'fixed'.
+
+### Import `schemaLocation`
+
+Imported files - a single unique file must be used when importing a namespace.
+A single schema may not contain different import statements for the same
+namespace but which specify different files.

Review Comment:
   Omit "but"



##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,214 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.2  2023-10-18
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make
+DFDL schemas interoperate properly in conjunction with other data models has 
arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data 
models which are
+powerful, but not as expressive as DFDL.
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even 
this causes problems.
+Most other data processing systems were not designed with markup languages in 
mind, but rather for
+structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into 
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can 
be enforced (on
+request) to ensure that DFDL schemas will be usable with a variety of data 
processing systems.
+Creating DFDL schemas that adhere to this standard profile ensures maximal 
interoperability,
+including the ability to convert into JSON without name/namespace collisions.
+
+This is a proposal for a switch/option to be added to Daffodil which turns on 
enforcement of this
+standard profile, aka subset of DFDL.

Review Comment:
   Better worded as:
   
   standard profile as a subset of DFDL.



##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,214 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.2  2023-10-18
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make
+DFDL schemas interoperate properly in conjunction with other data models has 
arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data 
models which are
+powerful, but not as expressive as DFDL.
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even 
this causes problems.
+Most other data processing systems were not designed with markup languages in 
mind, but rather for
+structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into 
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can 
be enforced (on
+request) to ensure that DFDL schemas will be usable with a variety of data 
processing systems.
+Creating DFDL schemas that adhere to this standard profile ensures maximal 
interoperability,
+including the ability to convert into JSON without name/namespace collisions.
+
+This is a proposal for a switch/option to be added to Daffodil which turns on 
enforcement of this
+standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices
+
+Choices must be the model groups of complex type definitions and are not 
allowed in any other
+context.
+
+Each choice branch must begin with a different element. (This is already a XML 
Schema requirement -
+Unique Particle Attribution.)
+
+### Group References Cannot Carry DFDL Properties
+
+Group references are allowed, but DFDL properties cannot be expressed on group 
references; hence,
+combining those properties with those of the group definition is not required.
+
+While most data structure systems do not have this notion of reusable groups, 
when restricted as 
+described, reusable groups are something users could implement by way of a 
simple macro 
+pre-processor, so having this in the standard profile really does not create 
any particular 
+challenge when mapping from DFDL standard profile schemas into any data 
structure system. 
+Groups and group references are used heavily in DFDL schemas to push down 
complexity like 
+discriminators that are reused in many places.
+Allowing groups and group references reduces the difficulty of converting many 
large DFDL 
+schemas to conform to the standard profile. 
+
+### No Element References
+
+There is no corresponding form of sharing in most data structure systems.
+
+### No Namespace-Qualified Names
+
+Only elementFormDefault 'unqualified' is allowed.
+Note that this is the default for XML Schema and DFDL.
+
+### Unique Namespace Prefixes
+
+All namespace prefixes must be unique in the entire schema.
+
+This enables one to create unique identifiers by concatenating prefix_local to 
create global names.
+
+### All Element Children Have Unique Names
+
+All children element declarations must have unique names within their 
enclosing parent element.
+
+#### Discussion

Review Comment:
   Insert a blank line after this line to avoid a markdownlint warning.



##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,214 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.2  2023-10-18
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make
+DFDL schemas interoperate properly in conjunction with other data models has 
arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data 
models which are
+powerful, but not as expressive as DFDL.
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even 
this causes problems.
+Most other data processing systems were not designed with markup languages in 
mind, but rather for
+structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into 
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can 
be enforced (on
+request) to ensure that DFDL schemas will be usable with a variety of data 
processing systems.
+Creating DFDL schemas that adhere to this standard profile ensures maximal 
interoperability,
+including the ability to convert into JSON without name/namespace collisions.
+
+This is a proposal for a switch/option to be added to Daffodil which turns on 
enforcement of this
+standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices
+
+Choices must be the model groups of complex type definitions and are not 
allowed in any other
+context.
+
+Each choice branch must begin with a different element. (This is already a XML 
Schema requirement -
+Unique Particle Attribution.)
+
+### Group References Cannot Carry DFDL Properties
+
+Group references are allowed, but DFDL properties cannot be expressed on group 
references; hence,
+combining those properties with those of the group definition is not required.
+
+While most data structure systems do not have this notion of reusable groups, 
when restricted as 
+described, reusable groups are something users could implement by way of a 
simple macro 
+pre-processor, so having this in the standard profile really does not create 
any particular 
+challenge when mapping from DFDL standard profile schemas into any data 
structure system. 
+Groups and group references are used heavily in DFDL schemas to push down 
complexity like 
+discriminators that are reused in many places.
+Allowing groups and group references reduces the difficulty of converting many 
large DFDL 
+schemas to conform to the standard profile. 
+
+### No Element References
+
+There is no corresponding form of sharing in most data structure systems.
+
+### No Namespace-Qualified Names
+
+Only elementFormDefault 'unqualified' is allowed.
+Note that this is the default for XML Schema and DFDL.
+
+### Unique Namespace Prefixes
+
+All namespace prefixes must be unique in the entire schema.
+
+This enables one to create unique identifiers by concatenating prefix_local to 
create global names.
+
+### All Element Children Have Unique Names
+
+All children element declarations must have unique names within their 
enclosing parent element.
+
+#### Discussion
+Note that this causes issues in a number of large DFDL schemas (e.g, VMF) 
which attempt to implement a 

Review Comment:
   Do you want to spell out VMF and/or hyperlink VMF to its github repo so 
people can look at it?



##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,214 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.2  2023-10-18
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make
+DFDL schemas interoperate properly in conjunction with other data models has 
arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data 
models which are
+powerful, but not as expressive as DFDL.
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even 
this causes problems.
+Most other data processing systems were not designed with markup languages in 
mind, but rather for
+structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into 
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can 
be enforced (on
+request) to ensure that DFDL schemas will be usable with a variety of data 
processing systems.
+Creating DFDL schemas that adhere to this standard profile ensures maximal 
interoperability,
+including the ability to convert into JSON without name/namespace collisions.
+
+This is a proposal for a switch/option to be added to Daffodil which turns on 
enforcement of this
+standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices
+
+Choices must be the model groups of complex type definitions and are not 
allowed in any other
+context.
+
+Each choice branch must begin with a different element. (This is already a XML 
Schema requirement -
+Unique Particle Attribution.)
+
+### Group References Cannot Carry DFDL Properties
+
+Group references are allowed, but DFDL properties cannot be expressed on group 
references; hence,
+combining those properties with those of the group definition is not required.
+
+While most data structure systems do not have this notion of reusable groups, 
when restricted as 
+described, reusable groups are something users could implement by way of a 
simple macro 
+pre-processor, so having this in the standard profile really does not create 
any particular 
+challenge when mapping from DFDL standard profile schemas into any data 
structure system. 
+Groups and group references are used heavily in DFDL schemas to push down 
complexity like 
+discriminators that are reused in many places.
+Allowing groups and group references reduces the difficulty of converting many 
large DFDL 
+schemas to conform to the standard profile. 
+
+### No Element References
+
+There is no corresponding form of sharing in most data structure systems.
+
+### No Namespace-Qualified Names
+
+Only elementFormDefault 'unqualified' is allowed.
+Note that this is the default for XML Schema and DFDL.
+
+### Unique Namespace Prefixes
+
+All namespace prefixes must be unique in the entire schema.
+
+This enables one to create unique identifiers by concatenating prefix_local to 
create global names.
+
+### All Element Children Have Unique Names
+
+All children element declarations must have unique names within their 
enclosing parent element.
+
+#### Discussion
+Note that this causes issues in a number of large DFDL schemas (e.g, VMF) 
which attempt to implement a 
+single DFDL schema that is capable of handling multiple versions of the data 
format.
+
+In this case, the schema uses a construct like:
+```xml
+<choice>
+  <sequence>
+    <element name="C" type="zString"/>
+    <element name="hdr" type="hdr_version_C_type"/>
+  </sequence>
+  <sequence>
+    <element name="D" type="zString"/>
+    <element name="hdr" type="hdr_version_D_type"/>
+  </sequence>
+</choice>
+```
+In the above, you can see that there are two separate element declarations 
named hdr, of different types. 
+This allows common sub-structure that is the same in versions C and D to be 
addressed by path expressions that are
+polymorphic. They do not have a path step component that identifies the 
version. 
+
+However, if we require all children to have unique names, then this would have 
to be elements with distinct names on each
+branch such as hdrC and hdrD, and then paths, even those reaching sub-fields 
that are common to both versions, 
+have path steps that are specifically requesting a particular version. 
+
+This is a bit painful particularly if there are many expressions that need to 
reference into the common fields, because
+all such expressions would need to be duplicated for version C and version D. 
+
+An alternative solution is that this could be overcome by a way of creating 
path expressions with wildcards in 
+them eg., ".../hdr*/...".
+An extension of this kind in DFDL has already been proposed/discussed some 
time ago by the DFDL workgroup, but has 
+not yet turned into a formal proposal. (The DFDL4Space implementation by the 
ESA has a kind of wildcard feature like
+this as a DFDL extension.)
+
+Another way of addressing this is to put the version distinction at precisely 
each point of difference between the 
+schema versions. 
+This is, however, not the way some large schemas were created, as these 
schemas are machine generated from the 
+individual format specifications. The generator is not aware of the individual 
differences between the versions,
+but only that there are *some* differences between them. 
+It requires a more sophisticated schema generator to compute these fine-level 
diffs between the two schemas. 
+
+
+### Nillable Simple Types Only (TBD: May not be necessary)
+
+Nillable is allowed only for simple type elements. 
+
+### Element Name/Identifier Restrictions
+
+Element names must consist of all non-whitespace characters from the
+Unicode basic multilingual plane (no surrogate pairs in element names).
+
+They may not contain any control characters (Uncode class Cc) may not contain 
various punctuation

Review Comment:
   *[and] may not....



##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,214 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.2  2023-10-18
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make
+DFDL schemas interoperate properly in conjunction with other data models has 
arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data 
models which are
+powerful, but not as expressive as DFDL.
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even 
this causes problems.
+Most other data processing systems were not designed with markup languages in 
mind, but rather for
+structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into 
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can 
be enforced (on
+request) to ensure that DFDL schemas will be usable with a variety of data 
processing systems.
+Creating DFDL schemas that adhere to this standard profile ensures maximal 
interoperability,
+including the ability to convert into JSON without name/namespace collisions.
+
+This is a proposal for a switch/option to be added to Daffodil which turns on 
enforcement of this
+standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices
+
+Choices must be the model groups of complex type definitions and are not 
allowed in any other
+context.
+
+Each choice branch must begin with a different element. (This is already a XML 
Schema requirement -
+Unique Particle Attribution.)
+
+### Group References Cannot Carry DFDL Properties
+
+Group references are allowed, but DFDL properties cannot be expressed on group 
references; hence,
+combining those properties with those of the group definition is not required.
+
+While most data structure systems do not have this notion of reusable groups, 
when restricted as 
+described, reusable groups are something users could implement by way of a 
simple macro 
+pre-processor, so having this in the standard profile really does not create 
any particular 
+challenge when mapping from DFDL standard profile schemas into any data 
structure system. 
+Groups and group references are used heavily in DFDL schemas to push down 
complexity like 
+discriminators that are reused in many places.
+Allowing groups and group references reduces the difficulty of converting many 
large DFDL 
+schemas to conform to the standard profile. 
+
+### No Element References
+
+There is no corresponding form of sharing in most data structure systems.
+
+### No Namespace-Qualified Names
+
+Only elementFormDefault 'unqualified' is allowed.
+Note that this is the default for XML Schema and DFDL.
+
+### Unique Namespace Prefixes
+
+All namespace prefixes must be unique in the entire schema.
+
+This enables one to create unique identifiers by concatenating prefix_local to 
create global names.
+
+### All Element Children Have Unique Names
+
+All children element declarations must have unique names within their 
enclosing parent element.
+
+#### Discussion
+Note that this causes issues in a number of large DFDL schemas (e.g, VMF) 
which attempt to implement a 
+single DFDL schema that is capable of handling multiple versions of the data 
format.
+
+In this case, the schema uses a construct like:
+```xml
+<choice>
+  <sequence>
+    <element name="C" type="zString"/>
+    <element name="hdr" type="hdr_version_C_type"/>
+  </sequence>
+  <sequence>
+    <element name="D" type="zString"/>
+    <element name="hdr" type="hdr_version_D_type"/>
+  </sequence>
+</choice>
+```
+In the above, you can see that there are two separate element declarations 
named hdr, of different types. 
+This allows common sub-structure that is the same in versions C and D to be 
addressed by path expressions that are
+polymorphic. They do not have a path step component that identifies the 
version. 
+
+However, if we require all children to have unique names, then this would have 
to be elements with distinct names on each
+branch such as hdrC and hdrD, and then paths, even those reaching sub-fields 
that are common to both versions, 
+have path steps that are specifically requesting a particular version. 
+
+This is a bit painful particularly if there are many expressions that need to 
reference into the common fields, because
+all such expressions would need to be duplicated for version C and version D. 
+
+An alternative solution is that this could be overcome by a way of creating 
path expressions with wildcards in 
+them eg., ".../hdr*/...".
+An extension of this kind in DFDL has already been proposed/discussed some 
time ago by the DFDL workgroup, but has 
+not yet turned into a formal proposal. (The DFDL4Space implementation by the 
ESA has a kind of wildcard feature like
+this as a DFDL extension.)
+
+Another way of addressing this is to put the version distinction at precisely 
each point of difference between the 
+schema versions. 
+This is, however, not the way some large schemas were created, as these 
schemas are machine generated from the 
+individual format specifications. The generator is not aware of the individual 
differences between the versions,
+but only that there are *some* differences between them. 
+It requires a more sophisticated schema generator to compute these fine-level 
diffs between the two schemas. 
+

Review Comment:
   Yet another way might be possible if you use choice dispatch keys, which 
simplifies knowing which choice branch in take in every place without relying 
on backtracking.  You could increase the number of choice elements and make 
them finer-grained so that the elements `C` and `D` are in one choice, the 
common fields in `hdr_version_*_type` are not in any choice, and then any 
different fields in `hdr_version_*_type` are in their own choices as well.  If 
you need to reuse common definitions in multiple places, you can put them in 
groups and reference them with group references as well.



##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,214 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.2  2023-10-18
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make
+DFDL schemas interoperate properly in conjunction with other data models has 
arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data 
models which are
+powerful, but not as expressive as DFDL.
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even 
this causes problems.
+Most other data processing systems were not designed with markup languages in 
mind, but rather for
+structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into 
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can 
be enforced (on
+request) to ensure that DFDL schemas will be usable with a variety of data 
processing systems.
+Creating DFDL schemas that adhere to this standard profile ensures maximal 
interoperability,
+including the ability to convert into JSON without name/namespace collisions.
+
+This is a proposal for a switch/option to be added to Daffodil which turns on 
enforcement of this
+standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices
+
+Choices must be the model groups of complex type definitions and are not 
allowed in any other
+context.
+
+Each choice branch must begin with a different element. (This is already a XML 
Schema requirement -
+Unique Particle Attribution.)
+
+### Group References Cannot Carry DFDL Properties
+
+Group references are allowed, but DFDL properties cannot be expressed on group 
references; hence,
+combining those properties with those of the group definition is not required.
+
+While most data structure systems do not have this notion of reusable groups, 
when restricted as 
+described, reusable groups are something users could implement by way of a 
simple macro 
+pre-processor, so having this in the standard profile really does not create 
any particular 
+challenge when mapping from DFDL standard profile schemas into any data 
structure system. 
+Groups and group references are used heavily in DFDL schemas to push down 
complexity like 
+discriminators that are reused in many places.
+Allowing groups and group references reduces the difficulty of converting many 
large DFDL 
+schemas to conform to the standard profile. 
+
+### No Element References
+
+There is no corresponding form of sharing in most data structure systems.
+
+### No Namespace-Qualified Names
+
+Only elementFormDefault 'unqualified' is allowed.
+Note that this is the default for XML Schema and DFDL.
+
+### Unique Namespace Prefixes
+
+All namespace prefixes must be unique in the entire schema.
+
+This enables one to create unique identifiers by concatenating prefix_local to 
create global names.
+
+### All Element Children Have Unique Names
+
+All children element declarations must have unique names within their 
enclosing parent element.
+
+#### Discussion
+Note that this causes issues in a number of large DFDL schemas (e.g, VMF) 
which attempt to implement a 
+single DFDL schema that is capable of handling multiple versions of the data 
format.
+
+In this case, the schema uses a construct like:
+```xml
+<choice>
+  <sequence>
+    <element name="C" type="zString"/>
+    <element name="hdr" type="hdr_version_C_type"/>
+  </sequence>
+  <sequence>
+    <element name="D" type="zString"/>
+    <element name="hdr" type="hdr_version_D_type"/>
+  </sequence>
+</choice>
+```
+In the above, you can see that there are two separate element declarations 
named hdr, of different types. 
+This allows common sub-structure that is the same in versions C and D to be 
addressed by path expressions that are
+polymorphic. They do not have a path step component that identifies the 
version. 
+
+However, if we require all children to have unique names, then this would have 
to be elements with distinct names on each
+branch such as hdrC and hdrD, and then paths, even those reaching sub-fields 
that are common to both versions, 
+have path steps that are specifically requesting a particular version. 
+
+This is a bit painful particularly if there are many expressions that need to 
reference into the common fields, because
+all such expressions would need to be duplicated for version C and version D. 
+
+An alternative solution is that this could be overcome by a way of creating 
path expressions with wildcards in 
+them eg., ".../hdr*/...".
+An extension of this kind in DFDL has already been proposed/discussed some 
time ago by the DFDL workgroup, but has 
+not yet turned into a formal proposal. (The DFDL4Space implementation by the 
ESA has a kind of wildcard feature like
+this as a DFDL extension.)
+
+Another way of addressing this is to put the version distinction at precisely 
each point of difference between the 
+schema versions. 
+This is, however, not the way some large schemas were created, as these 
schemas are machine generated from the 
+individual format specifications. The generator is not aware of the individual 
differences between the versions,
+but only that there are *some* differences between them. 
+It requires a more sophisticated schema generator to compute these fine-level 
diffs between the two schemas. 
+
+
+### Nillable Simple Types Only (TBD: May not be necessary)
+
+Nillable is allowed only for simple type elements. 
+
+### Element Name/Identifier Restrictions
+
+Element names must consist of all non-whitespace characters from the
+Unicode basic multilingual plane (no surrogate pairs in element names).
+
+They may not contain any control characters (Uncode class Cc) may not contain 
various punctuation
+characters (Uncode class Ps, Pe, Pd, Pc, Pf, Pi, Po, nor $).
+
+This is a lowest-common denominator of identifier rules intended to allow
+DFDL schema identifiers to be mapped into ANY programming language or
+structure declaration language, while at the same time allowing use of
+Unicode characters.
+
+Element names may not begin with a digit.
+
+Users are encourated to use "A-Za-Z0-9" only as some systems may not allow use 
of unicode in

Review Comment:
   *Unicode characters



##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,214 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.2  2023-10-18
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make
+DFDL schemas interoperate properly in conjunction with other data models has 
arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data 
models which are
+powerful, but not as expressive as DFDL.
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even 
this causes problems.
+Most other data processing systems were not designed with markup languages in 
mind, but rather for
+structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into 
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can 
be enforced (on
+request) to ensure that DFDL schemas will be usable with a variety of data 
processing systems.
+Creating DFDL schemas that adhere to this standard profile ensures maximal 
interoperability,
+including the ability to convert into JSON without name/namespace collisions.
+
+This is a proposal for a switch/option to be added to Daffodil which turns on 
enforcement of this
+standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices
+
+Choices must be the model groups of complex type definitions and are not 
allowed in any other
+context.
+
+Each choice branch must begin with a different element. (This is already a XML 
Schema requirement -
+Unique Particle Attribution.)
+
+### Group References Cannot Carry DFDL Properties
+
+Group references are allowed, but DFDL properties cannot be expressed on group 
references; hence,
+combining those properties with those of the group definition is not required.
+
+While most data structure systems do not have this notion of reusable groups, 
when restricted as 
+described, reusable groups are something users could implement by way of a 
simple macro 
+pre-processor, so having this in the standard profile really does not create 
any particular 
+challenge when mapping from DFDL standard profile schemas into any data 
structure system. 
+Groups and group references are used heavily in DFDL schemas to push down 
complexity like 
+discriminators that are reused in many places.
+Allowing groups and group references reduces the difficulty of converting many 
large DFDL 
+schemas to conform to the standard profile. 
+
+### No Element References
+
+There is no corresponding form of sharing in most data structure systems.
+
+### No Namespace-Qualified Names
+
+Only elementFormDefault 'unqualified' is allowed.
+Note that this is the default for XML Schema and DFDL.
+
+### Unique Namespace Prefixes
+
+All namespace prefixes must be unique in the entire schema.
+
+This enables one to create unique identifiers by concatenating prefix_local to 
create global names.
+
+### All Element Children Have Unique Names
+
+All children element declarations must have unique names within their 
enclosing parent element.
+
+#### Discussion
+Note that this causes issues in a number of large DFDL schemas (e.g, VMF) 
which attempt to implement a 
+single DFDL schema that is capable of handling multiple versions of the data 
format.
+
+In this case, the schema uses a construct like:
+```xml
+<choice>
+  <sequence>
+    <element name="C" type="zString"/>
+    <element name="hdr" type="hdr_version_C_type"/>
+  </sequence>
+  <sequence>
+    <element name="D" type="zString"/>
+    <element name="hdr" type="hdr_version_D_type"/>
+  </sequence>
+</choice>
+```
+In the above, you can see that there are two separate element declarations 
named hdr, of different types. 
+This allows common sub-structure that is the same in versions C and D to be 
addressed by path expressions that are
+polymorphic. They do not have a path step component that identifies the 
version. 
+
+However, if we require all children to have unique names, then this would have 
to be elements with distinct names on each
+branch such as hdrC and hdrD, and then paths, even those reaching sub-fields 
that are common to both versions, 
+have path steps that are specifically requesting a particular version. 
+
+This is a bit painful particularly if there are many expressions that need to 
reference into the common fields, because
+all such expressions would need to be duplicated for version C and version D. 
+
+An alternative solution is that this could be overcome by a way of creating 
path expressions with wildcards in 
+them eg., ".../hdr*/...".
+An extension of this kind in DFDL has already been proposed/discussed some 
time ago by the DFDL workgroup, but has 
+not yet turned into a formal proposal. (The DFDL4Space implementation by the 
ESA has a kind of wildcard feature like
+this as a DFDL extension.)
+
+Another way of addressing this is to put the version distinction at precisely 
each point of difference between the 
+schema versions. 
+This is, however, not the way some large schemas were created, as these 
schemas are machine generated from the 
+individual format specifications. The generator is not aware of the individual 
differences between the versions,
+but only that there are *some* differences between them. 
+It requires a more sophisticated schema generator to compute these fine-level 
diffs between the two schemas. 
+
+
+### Nillable Simple Types Only (TBD: May not be necessary)
+
+Nillable is allowed only for simple type elements. 
+
+### Element Name/Identifier Restrictions
+
+Element names must consist of all non-whitespace characters from the
+Unicode basic multilingual plane (no surrogate pairs in element names).
+
+They may not contain any control characters (Uncode class Cc) may not contain 
various punctuation
+characters (Uncode class Ps, Pe, Pd, Pc, Pf, Pi, Po, nor $).
+
+This is a lowest-common denominator of identifier rules intended to allow
+DFDL schema identifiers to be mapped into ANY programming language or
+structure declaration language, while at the same time allowing use of
+Unicode characters.
+
+Element names may not begin with a digit.
+
+Users are encourated to use "A-Za-Z0-9" only as some systems may not allow use 
of unicode in
+identifiers.
+
+Element names may not begin with any prefix defined as part of the schema
+followed by an "_" as this could be ambiguous with names being made globally
+unique by appending prefix, "_" and local name.
+
+### String Content Restrictions
+
+Schemas may only be written in UTF-8 encoding.
+
+The DFDL property dfdl:utf16Width must be 'fixed'.
+
+### Import `schemaLocation`
+
+Imported files - a single unique file must be used when importing a namespace.

Review Comment:
   *A namespace must be imported from a single unique imported file.



##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,214 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.2  2023-10-18
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make
+DFDL schemas interoperate properly in conjunction with other data models has 
arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data 
models which are
+powerful, but not as expressive as DFDL.
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even 
this causes problems.
+Most other data processing systems were not designed with markup languages in 
mind, but rather for
+structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into 
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can 
be enforced (on
+request) to ensure that DFDL schemas will be usable with a variety of data 
processing systems.
+Creating DFDL schemas that adhere to this standard profile ensures maximal 
interoperability,
+including the ability to convert into JSON without name/namespace collisions.
+
+This is a proposal for a switch/option to be added to Daffodil which turns on 
enforcement of this
+standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices
+
+Choices must be the model groups of complex type definitions and are not 
allowed in any other
+context.
+
+Each choice branch must begin with a different element. (This is already a XML 
Schema requirement -
+Unique Particle Attribution.)
+
+### Group References Cannot Carry DFDL Properties
+
+Group references are allowed, but DFDL properties cannot be expressed on group 
references; hence,
+combining those properties with those of the group definition is not required.
+
+While most data structure systems do not have this notion of reusable groups, 
when restricted as 
+described, reusable groups are something users could implement by way of a 
simple macro 
+pre-processor, so having this in the standard profile really does not create 
any particular 
+challenge when mapping from DFDL standard profile schemas into any data 
structure system. 
+Groups and group references are used heavily in DFDL schemas to push down 
complexity like 
+discriminators that are reused in many places.
+Allowing groups and group references reduces the difficulty of converting many 
large DFDL 
+schemas to conform to the standard profile. 
+
+### No Element References
+
+There is no corresponding form of sharing in most data structure systems.
+
+### No Namespace-Qualified Names
+
+Only elementFormDefault 'unqualified' is allowed.
+Note that this is the default for XML Schema and DFDL.
+
+### Unique Namespace Prefixes
+
+All namespace prefixes must be unique in the entire schema.
+
+This enables one to create unique identifiers by concatenating prefix_local to 
create global names.
+
+### All Element Children Have Unique Names
+
+All children element declarations must have unique names within their 
enclosing parent element.
+
+#### Discussion
+Note that this causes issues in a number of large DFDL schemas (e.g, VMF) 
which attempt to implement a 
+single DFDL schema that is capable of handling multiple versions of the data 
format.
+
+In this case, the schema uses a construct like:
+```xml
+<choice>
+  <sequence>
+    <element name="C" type="zString"/>
+    <element name="hdr" type="hdr_version_C_type"/>
+  </sequence>
+  <sequence>
+    <element name="D" type="zString"/>
+    <element name="hdr" type="hdr_version_D_type"/>
+  </sequence>
+</choice>
+```
+In the above, you can see that there are two separate element declarations 
named hdr, of different types. 
+This allows common sub-structure that is the same in versions C and D to be 
addressed by path expressions that are
+polymorphic. They do not have a path step component that identifies the 
version. 
+
+However, if we require all children to have unique names, then this would have 
to be elements with distinct names on each
+branch such as hdrC and hdrD, and then paths, even those reaching sub-fields 
that are common to both versions, 
+have path steps that are specifically requesting a particular version. 
+
+This is a bit painful particularly if there are many expressions that need to 
reference into the common fields, because
+all such expressions would need to be duplicated for version C and version D. 
+
+An alternative solution is that this could be overcome by a way of creating 
path expressions with wildcards in 
+them eg., ".../hdr*/...".
+An extension of this kind in DFDL has already been proposed/discussed some 
time ago by the DFDL workgroup, but has 
+not yet turned into a formal proposal. (The DFDL4Space implementation by the 
ESA has a kind of wildcard feature like
+this as a DFDL extension.)
+
+Another way of addressing this is to put the version distinction at precisely 
each point of difference between the 
+schema versions. 
+This is, however, not the way some large schemas were created, as these 
schemas are machine generated from the 
+individual format specifications. The generator is not aware of the individual 
differences between the versions,
+but only that there are *some* differences between them. 
+It requires a more sophisticated schema generator to compute these fine-level 
diffs between the two schemas. 
+
+
+### Nillable Simple Types Only (TBD: May not be necessary)
+
+Nillable is allowed only for simple type elements. 

Review Comment:
   Please clarify which simple type elements are nillable.  I think numbers by 
themselves can never be nillable unless you're talking about optional simple 
type elements at certain places in an enclosing complex type?



##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,214 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.2  2023-10-18
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make
+DFDL schemas interoperate properly in conjunction with other data models has 
arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data 
models which are
+powerful, but not as expressive as DFDL.
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even 
this causes problems.
+Most other data processing systems were not designed with markup languages in 
mind, but rather for
+structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into 
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can 
be enforced (on
+request) to ensure that DFDL schemas will be usable with a variety of data 
processing systems.
+Creating DFDL schemas that adhere to this standard profile ensures maximal 
interoperability,
+including the ability to convert into JSON without name/namespace collisions.
+
+This is a proposal for a switch/option to be added to Daffodil which turns on 
enforcement of this
+standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices
+
+Choices must be the model groups of complex type definitions and are not 
allowed in any other
+context.
+
+Each choice branch must begin with a different element. (This is already a XML 
Schema requirement -
+Unique Particle Attribution.)
+
+### Group References Cannot Carry DFDL Properties
+
+Group references are allowed, but DFDL properties cannot be expressed on group 
references; hence,
+combining those properties with those of the group definition is not required.
+
+While most data structure systems do not have this notion of reusable groups, 
when restricted as 
+described, reusable groups are something users could implement by way of a 
simple macro 
+pre-processor, so having this in the standard profile really does not create 
any particular 
+challenge when mapping from DFDL standard profile schemas into any data 
structure system. 
+Groups and group references are used heavily in DFDL schemas to push down 
complexity like 
+discriminators that are reused in many places.
+Allowing groups and group references reduces the difficulty of converting many 
large DFDL 
+schemas to conform to the standard profile. 
+
+### No Element References
+
+There is no corresponding form of sharing in most data structure systems.
+
+### No Namespace-Qualified Names
+
+Only elementFormDefault 'unqualified' is allowed.
+Note that this is the default for XML Schema and DFDL.
+
+### Unique Namespace Prefixes
+
+All namespace prefixes must be unique in the entire schema.
+
+This enables one to create unique identifiers by concatenating prefix_local to 
create global names.
+
+### All Element Children Have Unique Names
+
+All children element declarations must have unique names within their 
enclosing parent element.
+
+#### Discussion
+Note that this causes issues in a number of large DFDL schemas (e.g, VMF) 
which attempt to implement a 
+single DFDL schema that is capable of handling multiple versions of the data 
format.
+
+In this case, the schema uses a construct like:
+```xml
+<choice>
+  <sequence>
+    <element name="C" type="zString"/>
+    <element name="hdr" type="hdr_version_C_type"/>
+  </sequence>
+  <sequence>
+    <element name="D" type="zString"/>
+    <element name="hdr" type="hdr_version_D_type"/>
+  </sequence>
+</choice>
+```
+In the above, you can see that there are two separate element declarations 
named hdr, of different types. 
+This allows common sub-structure that is the same in versions C and D to be 
addressed by path expressions that are
+polymorphic. They do not have a path step component that identifies the 
version. 
+
+However, if we require all children to have unique names, then this would have 
to be elements with distinct names on each
+branch such as hdrC and hdrD, and then paths, even those reaching sub-fields 
that are common to both versions, 
+have path steps that are specifically requesting a particular version. 
+
+This is a bit painful particularly if there are many expressions that need to 
reference into the common fields, because
+all such expressions would need to be duplicated for version C and version D. 
+
+An alternative solution is that this could be overcome by a way of creating 
path expressions with wildcards in 
+them eg., ".../hdr*/...".
+An extension of this kind in DFDL has already been proposed/discussed some 
time ago by the DFDL workgroup, but has 
+not yet turned into a formal proposal. (The DFDL4Space implementation by the 
ESA has a kind of wildcard feature like
+this as a DFDL extension.)
+
+Another way of addressing this is to put the version distinction at precisely 
each point of difference between the 
+schema versions. 
+This is, however, not the way some large schemas were created, as these 
schemas are machine generated from the 
+individual format specifications. The generator is not aware of the individual 
differences between the versions,
+but only that there are *some* differences between them. 
+It requires a more sophisticated schema generator to compute these fine-level 
diffs between the two schemas. 
+
+
+### Nillable Simple Types Only (TBD: May not be necessary)
+
+Nillable is allowed only for simple type elements. 
+
+### Element Name/Identifier Restrictions
+
+Element names must consist of all non-whitespace characters from the
+Unicode basic multilingual plane (no surrogate pairs in element names).
+
+They may not contain any control characters (Uncode class Cc) may not contain 
various punctuation
+characters (Uncode class Ps, Pe, Pd, Pc, Pf, Pi, Po, nor $).
+
+This is a lowest-common denominator of identifier rules intended to allow
+DFDL schema identifiers to be mapped into ANY programming language or
+structure declaration language, while at the same time allowing use of
+Unicode characters.
+
+Element names may not begin with a digit.
+
+Users are encourated to use "A-Za-Z0-9" only as some systems may not allow use 
of unicode in
+identifiers.
+
+Element names may not begin with any prefix defined as part of the schema
+followed by an "_" as this could be ambiguous with names being made globally
+unique by appending prefix, "_" and local name.
+
+### String Content Restrictions
+
+Schemas may only be written in UTF-8 encoding.
+
+The DFDL property dfdl:utf16Width must be 'fixed'.
+
+### Import `schemaLocation`
+
+Imported files - a single unique file must be used when importing a namespace.
+A single schema may not contain different import statements for the same
+namespace but which specify different files.
+
+This is a practical requirement in Apache Daffodil today, but should be made 
explicit.
+
+The `schemaLocation` - if it begins with a "/" it is interpreted as an 
absolute path, otherwise a
+relative path. Both may be interpreted relative to a classpath.
+
+## Existing DFDL Restrictions
+
+Just as a reminder, the above standard-profile restrictions go on top of DFDL
+existing limitations on XML Schema such as:
+
+- arrays/optional - only for elements
+- no mixed content
+- no complex type derivation
+- no attributes
+- limited set of simple types
+- pattern facets only for xs:string elements
+- other facet restrictions by type
+
+## Possible additional restrictions
+
+### Troublesome Placement of dfdl:assert and dfdl:discriminator on Sequences & 
Choices
+
+The DFDL v1.0 rules about sequences/choices and statement annotations on them 
are confusing.
+In particular, a dfdl:assert or dfdl:discriminator with testKind 'expression' 
appears lexically at
+the top of the sequence/choice, but is executed after the sequence/choice 
content has been parsed.
+
+This is sufficiently error-prone that the standard profile should disallow
+it, requiring instead that an inner sequence carrying the assertion or
+discriminator with NO child content, be inserted in the sequence at the
+point where the evaluation is required to occur.
+
+# Enabling the Standard Profile
+
+The following ways should be available for a schema author to tell Daffodil 
they want enforcement of
+the standard profile (or not).

Review Comment:
   I see so many ways below that I think we need use cases justifying the need 
for implementing each way.  Can we drop any of these ways without losing any 
use cases?



##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,214 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.2  2023-10-18
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make
+DFDL schemas interoperate properly in conjunction with other data models has 
arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data 
models which are
+powerful, but not as expressive as DFDL.
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even 
this causes problems.
+Most other data processing systems were not designed with markup languages in 
mind, but rather for
+structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into 
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can 
be enforced (on
+request) to ensure that DFDL schemas will be usable with a variety of data 
processing systems.
+Creating DFDL schemas that adhere to this standard profile ensures maximal 
interoperability,
+including the ability to convert into JSON without name/namespace collisions.
+
+This is a proposal for a switch/option to be added to Daffodil which turns on 
enforcement of this
+standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices
+
+Choices must be the model groups of complex type definitions and are not 
allowed in any other
+context.
+
+Each choice branch must begin with a different element. (This is already a XML 
Schema requirement -
+Unique Particle Attribution.)
+
+### Group References Cannot Carry DFDL Properties
+
+Group references are allowed, but DFDL properties cannot be expressed on group 
references; hence,
+combining those properties with those of the group definition is not required.
+
+While most data structure systems do not have this notion of reusable groups, 
when restricted as 
+described, reusable groups are something users could implement by way of a 
simple macro 
+pre-processor, so having this in the standard profile really does not create 
any particular 
+challenge when mapping from DFDL standard profile schemas into any data 
structure system. 
+Groups and group references are used heavily in DFDL schemas to push down 
complexity like 
+discriminators that are reused in many places.
+Allowing groups and group references reduces the difficulty of converting many 
large DFDL 
+schemas to conform to the standard profile. 
+
+### No Element References
+
+There is no corresponding form of sharing in most data structure systems.
+
+### No Namespace-Qualified Names
+
+Only elementFormDefault 'unqualified' is allowed.
+Note that this is the default for XML Schema and DFDL.
+
+### Unique Namespace Prefixes
+
+All namespace prefixes must be unique in the entire schema.
+
+This enables one to create unique identifiers by concatenating prefix_local to 
create global names.
+
+### All Element Children Have Unique Names
+
+All children element declarations must have unique names within their 
enclosing parent element.
+
+#### Discussion
+Note that this causes issues in a number of large DFDL schemas (e.g, VMF) 
which attempt to implement a 
+single DFDL schema that is capable of handling multiple versions of the data 
format.
+
+In this case, the schema uses a construct like:
+```xml
+<choice>
+  <sequence>
+    <element name="C" type="zString"/>
+    <element name="hdr" type="hdr_version_C_type"/>
+  </sequence>
+  <sequence>
+    <element name="D" type="zString"/>
+    <element name="hdr" type="hdr_version_D_type"/>
+  </sequence>
+</choice>
+```
+In the above, you can see that there are two separate element declarations 
named hdr, of different types. 
+This allows common sub-structure that is the same in versions C and D to be 
addressed by path expressions that are
+polymorphic. They do not have a path step component that identifies the 
version. 
+
+However, if we require all children to have unique names, then this would have 
to be elements with distinct names on each
+branch such as hdrC and hdrD, and then paths, even those reaching sub-fields 
that are common to both versions, 
+have path steps that are specifically requesting a particular version. 
+
+This is a bit painful particularly if there are many expressions that need to 
reference into the common fields, because
+all such expressions would need to be duplicated for version C and version D. 
+
+An alternative solution is that this could be overcome by a way of creating 
path expressions with wildcards in 
+them eg., ".../hdr*/...".
+An extension of this kind in DFDL has already been proposed/discussed some 
time ago by the DFDL workgroup, but has 
+not yet turned into a formal proposal. (The DFDL4Space implementation by the 
ESA has a kind of wildcard feature like
+this as a DFDL extension.)
+
+Another way of addressing this is to put the version distinction at precisely 
each point of difference between the 
+schema versions. 
+This is, however, not the way some large schemas were created, as these 
schemas are machine generated from the 
+individual format specifications. The generator is not aware of the individual 
differences between the versions,
+but only that there are *some* differences between them. 
+It requires a more sophisticated schema generator to compute these fine-level 
diffs between the two schemas. 
+
+
+### Nillable Simple Types Only (TBD: May not be necessary)
+
+Nillable is allowed only for simple type elements. 
+
+### Element Name/Identifier Restrictions
+
+Element names must consist of all non-whitespace characters from the
+Unicode basic multilingual plane (no surrogate pairs in element names).
+
+They may not contain any control characters (Uncode class Cc) may not contain 
various punctuation
+characters (Uncode class Ps, Pe, Pd, Pc, Pf, Pi, Po, nor $).
+
+This is a lowest-common denominator of identifier rules intended to allow
+DFDL schema identifiers to be mapped into ANY programming language or
+structure declaration language, while at the same time allowing use of
+Unicode characters.
+
+Element names may not begin with a digit.
+
+Users are encourated to use "A-Za-Z0-9" only as some systems may not allow use 
of unicode in
+identifiers.
+
+Element names may not begin with any prefix defined as part of the schema
+followed by an "_" as this could be ambiguous with names being made globally
+unique by appending prefix, "_" and local name.
+
+### String Content Restrictions
+
+Schemas may only be written in UTF-8 encoding.
+
+The DFDL property dfdl:utf16Width must be 'fixed'.

Review Comment:
   What is `dfdl:utf16Width` used for, and why must it be 'fixed'?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to