tuxji commented on code in PR #121:
URL: https://github.com/apache/daffodil-site/pull/121#discussion_r1359951936


##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,133 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.1  2023-10-13
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make DFDL schemas interoperate properly in conjunction 
with other data models has arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data 
models which are powerful, but not as expressive as DFDL. 
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even 
this causes problems. 
+Most other data processing systems were not designed with markup languages in 
mind, but rather for structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into 
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can 
be enforced (on request) to ensure that DFDL schemas will be usable with a 
variety of data processing systems. 
+Creating DFDL schemas that adhere to this standard profile ensures maximal 
interoperability, including the ability to convert into JSON without 
name/namespace collisions. 
+
+This is a proposal for a switch/option to be added to Daffodil which turns on 
enforcement of this standard profile, aka subset of DFDL.

Review Comment:
   I agree with the idea of modifying Daffodil to validate (only upon request 
using an optional config/switch/option) that a DFDL schema uses a more 
restrictive subset of DFDL.
   
   Further below, I see optional config files proposed, but I don't see a 
command line switch/option proposed as well.  I'd want the CLI to have such an 
option too.



##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,133 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.1  2023-10-13
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make DFDL schemas interoperate properly in conjunction 
with other data models has arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data 
models which are powerful, but not as expressive as DFDL. 
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even 
this causes problems. 
+Most other data processing systems were not designed with markup languages in 
mind, but rather for structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into 
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can 
be enforced (on request) to ensure that DFDL schemas will be usable with a 
variety of data processing systems. 
+Creating DFDL schemas that adhere to this standard profile ensures maximal 
interoperability, including the ability to convert into JSON without 
name/namespace collisions. 
+
+This is a proposal for a switch/option to be added to Daffodil which turns on 
enforcement of this standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices 
+
+Choices must be the model groups of complex type definitions and are not 
allowed in any other context.
+
+Each choice branch must begin with a different element. (This is already a XML 
Schema requirement - Unique Particle Attribution.)
+
+### No Reusable Groups
+Group references and reusable group definitions are not allowed.
+
+### No Element References
+There is no corresponding form of sharing in most data structure systems.
+
+### No Namespace-Qualified Names
+Only elementFormDefault 'unqualified' is allowed. 
+Note that this is the default for XML Schema and DFDL. 
+
+### Unique Namespace Prefixes
+All namespace prefixes must be unique in the entire schema.
+
+This enables one to create unique identifiers by concatenating prefix_local to 
create global names.
+
+### All Element Children Have Unique Names
+All children element declarations must have unique names within their 
enclosing parent element
+
+### Nillable Simple Types Only
+
+Nillable is allowed only for simple type elements
+
+### Element Name/Identifier Restrictions
+Element names must consist of all non-whitespace characters from the 
+Unicode basic multilingual plane (no surrogate pairs in element names). 
+
+They may not contain any control characters (Uncode class Cc) may not contain 
various punctuation characters (Uncode class Ps, Pe, Pd, Pc, Pf, Pi, Po, nor $).
+
+This is a lowest-common denominator of identifier rules intended to allow 
+DFDL schema identifiers to be mapped into ANY programming language or
+structure declaration language, while at the same time allowing use of
+Unicode characters.
+
+Element names may not begin with a digit.
+
+Users are encourated to use "A-Za-Z0-9" only as some systems may not allow use 
of unicode in identifiers.
+
+Element names may not begin with any prefix defined as part of the schema 
+followed by an "_" as this could be ambiguous with names being made globally
+unique by appending prefix, "_" and local name.
+
+### String Content Restrictions
+
+Schemas may only be written in UTF-8 encoding. 
+
+The DFDL property dfdl:utf16Width must be 'fixed'.
+
+### Import `schemaLocation`
+
+Imported files - a single unique file must be used when importing a namespace. 
+A single schema may not contain different import statements for the same 
+namespace but which specify different files. 
+
+This is a practical requirement in Apache Daffodil today, but should be made 
explicit.
+
+The `schemaLocation` - if it begins with a "/" it is interpreted as an 
absolute path, otherwise a relative path. Both may be interpreted relative to a 
classpath.
+
+## Existing DFDL Restrictions
+
+Just as a reminder, the above standard-profile restrictions go on top of DFDL 
+existing limitations on XML Schema such as:
+
+- arrays/optional - only for elements
+- no mixed content
+- no complex type derivation
+- no attributes
+- limited set of simple types
+- pattern facets only for xs:string elements
+- other facet restrictions by type
+
+## Possible additional restrictions
+
+### Troublesome Placement of dfdl:assert and dfdl:discriminator on Sequences & 
Choices
+
+The DFDL v1.0 rules about sequences/choices and statement annotations on them 
are confusing.
+In particular, a dfdl:assert or dfdl:discriminator with testKind 'expression' 
appears lexically at the top of the sequence/choice, but is executed after the 
sequence/choice content has been parsed. 
+
+This is sufficiently error-prone that the standard profile should disallow 
+it, requiring instead that an inner sequence carrying the assertion or 
+discriminator with NO child content, be inserted in the sequence at the 
+point where the evaluation is required to occur. 
+
+# Enabling the Standard Profile
+
+The following ways should be available for a schema author to tell Daffodil 
they want enforcement of the standard profile (or not).
+- *Home Directory Properties File* - a file such as ~/.daffodil or 
daffodil.properties should contain a default value for choosing the standard 
profile. This default value can be overridden by other mechanisms which have 
higher priority.
+- *Schema Project Properties File* - a Daffodil config file at the root of the 
Daffodil schema project directory tree, should be able to specify a default 
value for use by the schema project. If the XML config file system is 
superceded by some other properties file per-schema, then that same file should 
enable specifying the standard profile is (or is not) to be used. 
+- *Per Schema-File property* - A property expessed at the top level of a DFDL 
schema file should indicate that the standard profile is requested. 
+  All schemas imported or included by a schema that requests the standard 
profile would be assumed to also be required to obey the standard profile. 
+  
+An opposite expression - that a schema explicitly is known to require more 
than the 
+standard profile, should also be allowed to be placed in any of these 
locations.
+Including such a non-standard-profile into a schema that requests the standard 
+profile should cause an error. 

Review Comment:
   Why bother with this option, unless you think Daffodil should check (with 
enforcement/erroring capability) that a schema really and truly uses something 
outside the standard profile too?



##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,133 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.1  2023-10-13
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make DFDL schemas interoperate properly in conjunction 
with other data models has arisen.

Review Comment:
   Document looks good overall, although I would wrap lines after 80 or 100 
columns (your choice).



##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,133 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.1  2023-10-13
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make DFDL schemas interoperate properly in conjunction 
with other data models has arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data 
models which are powerful, but not as expressive as DFDL. 
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even 
this causes problems. 
+Most other data processing systems were not designed with markup languages in 
mind, but rather for structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into 
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can 
be enforced (on request) to ensure that DFDL schemas will be usable with a 
variety of data processing systems. 
+Creating DFDL schemas that adhere to this standard profile ensures maximal 
interoperability, including the ability to convert into JSON without 
name/namespace collisions. 
+
+This is a proposal for a switch/option to be added to Daffodil which turns on 
enforcement of this standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices 
+
+Choices must be the model groups of complex type definitions and are not 
allowed in any other context.
+
+Each choice branch must begin with a different element. (This is already a XML 
Schema requirement - Unique Particle Attribution.)
+
+### No Reusable Groups
+Group references and reusable group definitions are not allowed.
+
+### No Element References
+There is no corresponding form of sharing in most data structure systems.
+
+### No Namespace-Qualified Names
+Only elementFormDefault 'unqualified' is allowed. 
+Note that this is the default for XML Schema and DFDL. 
+
+### Unique Namespace Prefixes
+All namespace prefixes must be unique in the entire schema.
+
+This enables one to create unique identifiers by concatenating prefix_local to 
create global names.
+
+### All Element Children Have Unique Names
+All children element declarations must have unique names within their 
enclosing parent element
+
+### Nillable Simple Types Only
+
+Nillable is allowed only for simple type elements
+
+### Element Name/Identifier Restrictions
+Element names must consist of all non-whitespace characters from the 
+Unicode basic multilingual plane (no surrogate pairs in element names). 
+
+They may not contain any control characters (Uncode class Cc) may not contain 
various punctuation characters (Uncode class Ps, Pe, Pd, Pc, Pf, Pi, Po, nor $).
+
+This is a lowest-common denominator of identifier rules intended to allow 
+DFDL schema identifiers to be mapped into ANY programming language or
+structure declaration language, while at the same time allowing use of
+Unicode characters.
+
+Element names may not begin with a digit.
+
+Users are encourated to use "A-Za-Z0-9" only as some systems may not allow use 
of unicode in identifiers.
+
+Element names may not begin with any prefix defined as part of the schema 
+followed by an "_" as this could be ambiguous with names being made globally
+unique by appending prefix, "_" and local name.
+
+### String Content Restrictions
+
+Schemas may only be written in UTF-8 encoding. 
+
+The DFDL property dfdl:utf16Width must be 'fixed'.
+
+### Import `schemaLocation`
+
+Imported files - a single unique file must be used when importing a namespace. 
+A single schema may not contain different import statements for the same 
+namespace but which specify different files. 
+
+This is a practical requirement in Apache Daffodil today, but should be made 
explicit.
+
+The `schemaLocation` - if it begins with a "/" it is interpreted as an 
absolute path, otherwise a relative path. Both may be interpreted relative to a 
classpath.
+
+## Existing DFDL Restrictions
+
+Just as a reminder, the above standard-profile restrictions go on top of DFDL 
+existing limitations on XML Schema such as:
+
+- arrays/optional - only for elements
+- no mixed content
+- no complex type derivation
+- no attributes
+- limited set of simple types
+- pattern facets only for xs:string elements
+- other facet restrictions by type
+
+## Possible additional restrictions
+
+### Troublesome Placement of dfdl:assert and dfdl:discriminator on Sequences & 
Choices
+
+The DFDL v1.0 rules about sequences/choices and statement annotations on them 
are confusing.
+In particular, a dfdl:assert or dfdl:discriminator with testKind 'expression' 
appears lexically at the top of the sequence/choice, but is executed after the 
sequence/choice content has been parsed. 
+
+This is sufficiently error-prone that the standard profile should disallow 
+it, requiring instead that an inner sequence carrying the assertion or 
+discriminator with NO child content, be inserted in the sequence at the 
+point where the evaluation is required to occur. 
+
+# Enabling the Standard Profile
+
+The following ways should be available for a schema author to tell Daffodil 
they want enforcement of the standard profile (or not).
+- *Home Directory Properties File* - a file such as ~/.daffodil or 
daffodil.properties should contain a default value for choosing the standard 
profile. This default value can be overridden by other mechanisms which have 
higher priority.
+- *Schema Project Properties File* - a Daffodil config file at the root of the 
Daffodil schema project directory tree, should be able to specify a default 
value for use by the schema project. If the XML config file system is 
superceded by some other properties file per-schema, then that same file should 
enable specifying the standard profile is (or is not) to be used. 
+- *Per Schema-File property* - A property expessed at the top level of a DFDL 
schema file should indicate that the standard profile is requested. 
+  All schemas imported or included by a schema that requests the standard 
profile would be assumed to also be required to obey the standard profile. 

Review Comment:
   A command line switch would be desirable too.



##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,133 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.1  2023-10-13
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make DFDL schemas interoperate properly in conjunction 
with other data models has arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data 
models which are powerful, but not as expressive as DFDL. 
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even 
this causes problems. 
+Most other data processing systems were not designed with markup languages in 
mind, but rather for structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into 
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can 
be enforced (on request) to ensure that DFDL schemas will be usable with a 
variety of data processing systems. 
+Creating DFDL schemas that adhere to this standard profile ensures maximal 
interoperability, including the ability to convert into JSON without 
name/namespace collisions. 
+
+This is a proposal for a switch/option to be added to Daffodil which turns on 
enforcement of this standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices 
+
+Choices must be the model groups of complex type definitions and are not 
allowed in any other context.
+
+Each choice branch must begin with a different element. (This is already a XML 
Schema requirement - Unique Particle Attribution.)
+
+### No Reusable Groups
+Group references and reusable group definitions are not allowed.
+
+### No Element References
+There is no corresponding form of sharing in most data structure systems.
+
+### No Namespace-Qualified Names
+Only elementFormDefault 'unqualified' is allowed. 
+Note that this is the default for XML Schema and DFDL. 
+
+### Unique Namespace Prefixes
+All namespace prefixes must be unique in the entire schema.
+
+This enables one to create unique identifiers by concatenating prefix_local to 
create global names.
+
+### All Element Children Have Unique Names
+All children element declarations must have unique names within their 
enclosing parent element

Review Comment:
   Good markdown style puts blank lines between ### header lines and text 
lines, wraps text lines to 80/100 colums, and ends sentences with periods.



##########
site/dev/design-notes/Proposed-DFDL-Standard-Profile.md:
##########
@@ -0,0 +1,133 @@
+# Proposal: DFDL Standard Profile
+
+#### Version  0.1  2023-10-13
+
+## Introduction
+
+In attempting to integrate Apache Daffodil with other data processing 
software, the need to make DFDL schemas interoperate properly in conjunction 
with other data models has arisen.
+
+Other tools such as Apache NiFi, Apache Drill, Apache Spark, etc. have data 
models which are powerful, but not as expressive as DFDL. 
+
+DFDL's data model is a simplification of XML Schemas's PSVI; however, even 
this causes problems. 
+Most other data processing systems were not designed with markup languages in 
mind, but rather for structured data.
+
+The following things are allowed in DFDL v1.0, but are difficult to map into 
most data models:
+
+- anonymous choices
+- duplicate element child names
+- namespaces that are different, but where the prefixes are not unique
+- global names for element children
+
+A more restrictive subset of DFDL, a _standard profile_, is needed which can 
be enforced (on request) to ensure that DFDL schemas will be usable with a 
variety of data processing systems. 
+Creating DFDL schemas that adhere to this standard profile ensures maximal 
interoperability, including the ability to convert into JSON without 
name/namespace collisions. 
+
+This is a proposal for a switch/option to be added to Daffodil which turns on 
enforcement of this standard profile, aka subset of DFDL.
+
+## Standard Profile Restrictions
+
+### No Anonymous Choices 
+
+Choices must be the model groups of complex type definitions and are not 
allowed in any other context.
+
+Each choice branch must begin with a different element. (This is already a XML 
Schema requirement - Unique Particle Attribution.)
+
+### No Reusable Groups
+Group references and reusable group definitions are not allowed.
+
+### No Element References
+There is no corresponding form of sharing in most data structure systems.
+
+### No Namespace-Qualified Names
+Only elementFormDefault 'unqualified' is allowed. 
+Note that this is the default for XML Schema and DFDL. 
+
+### Unique Namespace Prefixes
+All namespace prefixes must be unique in the entire schema.
+
+This enables one to create unique identifiers by concatenating prefix_local to 
create global names.
+
+### All Element Children Have Unique Names
+All children element declarations must have unique names within their 
enclosing parent element
+
+### Nillable Simple Types Only
+
+Nillable is allowed only for simple type elements

Review Comment:
   Please explain why this restriction helps Drill or JSON.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to