This is an automated email from the ASF dual-hosted git repository.
mbeckerle pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/daffodil-site.git
The following commit(s) were added to refs/heads/main by this push:
new 17f2e9c Add doc for dfdlx: alignmentKind, direction, repType, bits
functions, BLOBs.
17f2e9c is described below
commit 17f2e9c58ab3fb775a9adb8c8252309147ba4c8e
Author: Michael Beckerle <[email protected]>
AuthorDate: Tue Nov 4 12:18:01 2025 -0500
Add doc for dfdlx: alignmentKind, direction, repType, bits functions, BLOBs.
Note that dfdlx:repValueRanges is deprecated and is not documented
for LTS. (Per Confluence page)
Added table of contents to these complex pages.
Removed doc of deprecated daf:error function.
Also fix closed jira ticket reference on unsupported page
DAFFODIL-3044
# Conflicts:
# site/dfdl-extensions.md
# site/layers.md
---
site/binary-large-objects.md | 171 ++++++++++++++++++++
site/dfdl-extensions.md | 363 +++++++++++++++++++++++++++++++++++++------
site/layers.md | 51 ++++--
site/unsupported.md | 2 +-
4 files changed, 523 insertions(+), 64 deletions(-)
diff --git a/site/binary-large-objects.md b/site/binary-large-objects.md
new file mode 100644
index 0000000..07150f5
--- /dev/null
+++ b/site/binary-large-objects.md
@@ -0,0 +1,171 @@
+---
+description: Binary Large Objects Feature
+group: 'nav-right'
+layout: page
+title: 'Binary Large Objects Feature'
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+## Table of Contents
+{:.no_toc}
+<!-- The {: .no_toc } excludes the above heading from the ToC -->
+
+1. yes, this is the standard Jekyll way to do a ToC (this line gets removed)
+{:toc}
+<!-- note the above line {:toc} cannot have whitespace at the start -->
+
+
+<!--
+This page is linked from https://s.apache.org/daffodil-blob-feature.
+ If this page content moves, please update that link from https://s.apache.org.
+-->
+
+# Introduction
+
+Daffodil has implemented a DFDL extension that allows data much larger than
memory to be manipulated.
+
+A variety of data formats, such as for image and video files, consist of
fields of what is effectively metadata, surrounding large blocks of data
containing compressed image or video data.
+
+An important use case for DFDL is to expose this metadata for easy use, and to
provide access to
+the large data via a streaming mechanism akin to opening a file, thereby
avoiding
+large `xs:hexBinary` strings in the infoset.
+
+In RDBMS systems, BLOB (Binary Large Object) is the type used when the data
row returned from an SQL query will not contain the actual value data, but
rather a handle that can be used to open/read/write/close the BLOB.
+
+Daffodil has an analogous BLOB capability.
+This enables processing of images or video of arbitrary size without the need
to ever hold all the data in memory.
+
+This also bypasses the limitation on object size.
+
+
+# Type `xs:anyURI` and Property `dfdlx:objectKind`
+
+DFDL is extended to allow simple types to have the `xs:anyURI` type.
+Elements with this type will be treated as BLOB objects.
+
+The `dfdlx:objectKind` property is added to define what type of object it is.
+The valid value for this property is only `"bytes"` specifying binary large
objects.
+All other values reserved for future extensions of this feature.
+
+An example of this usage in a DFDL schema may look something like this:
+
+```xsd
+<xs:schema
+ xmlns:dfdlx="http://www.ogf.org/dfdl/dfdl-1.0/extensions"
+ xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/">
+
+ <xs:element name="data" type="xs:anyURI"
+ dfdlx:objectKind="bytes"
+ dfdl:lengthUnits="bytes"
+ dfdl:length="1024" />
+
+</xs:schema>
+```
+
+The resulting infoset (as XML) will look something like this:
+
+```xml
+<data>file:///path/to/blob/data</data>
+```
+
+With the 1024 bytes of data being written to a file at location
`/path/to/blob/data`.
+
+The BLOB URI must always use the _file scheme_ and must be absolute.
+
+# Daffodil BLOB API
+
+API calls are used to specify where Daffodil should write the BLOB files.
+
+Two functions are used on the Daffodil `InfosetOutputter`.
+
+The first API function allows a way to set the properties used when
+creating BLOB files, including the output directory, and prefix/suffix
+for the BLOB file.
+
+```scala
+/**
+ * Set the attributes for how to create blob files.
+ *
+ * @param dir the Path the the directory to create files. If the directory
+ * does not exist, Daffodil will attempt to create it before
+ * writing a blob.
+ * @param prefix the prefix string to be used in generating a blob file name
+ * @param suffix the suffix string to be used in generating a blob file name
+ */
+final def setBlobAttributes(directory: Path, prefix: String, suffix: String)
+```
+
+The second API function allows a way for the API user to get a list of
+all BLOB files that were created during `parse()`.
+
+```scala
+/**
+ * Get the list of blob paths that were output in the infoset.
+ *
+ * This is the same as what would be found by iterating over the infoset.
+ */
+final def getBlobFiles(): Seq[Path]
+```
+
+Note that no changes to the `unparse()` API are required, since the BLOB URI
provides
+all the necessary information to retrieve files containing BLOB data.
+
+BLOB files are not automatically deleted.
+It is the responsibility of the API user to determine when files are no
+longer needed and remove them.
+
+# DFDL Expressions
+
+Any expression access to the _data_ of a BLOB element will result in a
+Schema Definition Error during schema compilation.
+
+The _length_ of a BLOB element is available since it is very common in
+data formats to include both a BLOB payload and the length of that
+payload. On unparse, we can calculate the length of the BLOB data so
+that the value can be output in a length field in the data. This is
+done using the regular `dfdl:contentLength()` and `dfdl:valueLength()`
+functions.
+
+
+# Testing DFDL Schemas using BLOBs via the TDML Runner
+
+The TDML language is extended to support the `xsi:type="xs:anyURI"` annotation
on XML data elements.
+
+For example:
+
+```xml
+<tdml:dfdlInfoset>
+ <data xsi:type="xs:anyURI">path/to/blob/data</data>
+</tdml:dfdlInfoset>
+```
+
+The path provided as the URI value can be, and usually will be, a relative
path within the
+`src/test/resources` directory of the DFDL schema project.
+During Infoset comparisons the TDML Runner will compare the contents of this
file
+with the BLOB file in the corresponding element (having type `xs:anyURI`) of
the infoset.
+
+BLOB files created when running the tests are deleted when the test completes.
+
+# Command Line Interface
+
+The CLI supports ad-hoc testing of the use of BLOBs.
+BLOBs are written to the directory given by the JVM _System Property_
`user.dir` into
+a subdirectory of it named `daffodil-blobs`.
+If it does not exist, Daffodil will attempt to create the `daffodil-blobs`
directory.
+The CLI does not delete any BLOB files.
diff --git a/site/dfdl-extensions.md b/site/dfdl-extensions.md
index 092d503..792fdb7 100644
--- a/site/dfdl-extensions.md
+++ b/site/dfdl-extensions.md
@@ -1,6 +1,6 @@
---
layout: page
-title: DFDL Extensions
+title: Daffodil Extensions to the DFDL Language
group: nav-right
---
<!--
@@ -22,37 +22,61 @@ limitations under the License.
{% endcomment %}
-->
+## Table of Contents
+{:.no_toc}
+<!-- The {: .no_toc } excludes the above heading from the ToC -->
+
+1. yes, this is the standard Jekyll way to do a ToC (this line gets removed)
+{:toc}
+<!-- note the above line {:toc} cannot have whitespace at the start -->
+
+# Introduction
+
Daffodil provides extensions to the DFDL specification.
-These properties are in the namespace defined by the URI
+These functions and properties are in the namespace defined by the URI
``http://www.ogf.org/dfdl/dfdl-1.0/extensions`` which is normally bound to the
``dfdlx`` prefix
like so:
-
``` xml
-<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
- xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
- xmlns:dfdlx="http://www.ogf.org/dfdl/dfdl-1.0/extensions"
->
+<schema xmlns="http://www.w3.org/2001/XMLSchema"
+ xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+ xmlns:dfdlx="http://www.ogf.org/dfdl/dfdl-1.0/extensions">
```
-The following symbols defined in this namespace are described below.
+The DFDL language extensions described below have Long Term Support (LTS) in
Daffodil
+going forward, and are proposed for inclusion in a future revision of the DFDL
+standard.
+DFDL schema authors can depend on the features and behaviors defined here
without fear
+that these extensions will be withdrawn in the future.
-# Expression Functions
+# Binary Large Objects (BLOB) Feature
-## ``daf:error()``
+Daffodil supports processing data that contains large opaque binary objects,
+also known as _BLOBs_.
+These enable processing of data types such as images, audio, or video where
the
+data content is surrounded by important metadata.
+The DFDL Schema can expose the metadata fields for processing and carry
+along the opaque BLOB data in files.
-A function that can be used in DFDL expressions. This functions does not
return a value or accept any arguments. When called, it causes a Parse Error or
Unparse Error.
+There is
+[separate documentation for the Binary Large Object (BLOB)
feature](/binary-large-objects).
-*This function is deprecated as of Daffodil 2.0.0. Use the ``fn:error(...)``
function instead.*
+# Expression Functions
-## ``dfdlx:trace($value, $label)``
+## `dfdlx:trace(value, label)`
-A function that can be used in DFDL expressions, similar to the ``fn:trace()``
function. This logs the string ``$label`` followed by ``$value`` converted to a
string and returns ``$value``. The second argument must be of type
``xs:string``.
+A function that can be used to debug DFDL expressions, similar to
+the [XPath ``fn:trace(value,
label)``](https://www.w3.org/TR/xpath-functions-31/#func-trace)
+function.
+This creates a message from the string argument ``label`` followed by
``value`` converted to a
+string and logs the message.
+The function returns the ``value``.
+The second `label` argument must be of type ``xs:string``.
-## ``dfdlx:lookAhead(offset, bitSize)``
+## `dfdlx:lookAhead(offset, bitSize)`
-Read ``bitSize`` bits, where the first bit is located at an ``offset`` (in
bits)
-from the current location. The result is a ``xs:nonNegativeInteger``.
Restrictions:
+Read `bitSize` bits, where the first bit is located at an ``offset`` (in bits)
+ from the current location. The result is a ``xs:nonNegativeInteger``.
Restrictions:
- offset >=0
- bitSize >= 1
@@ -67,10 +91,12 @@ and data location.
the data being read will not be used.
### Examples of `dfdlx:lookAhead`
-
+
The following two elements both populate element `a` with the value of the
next 3 bits as an
-unsignedInt. They are not completely equivalent because the first will consume
3 bits of the
+unsignedInt.
+They are not completely equivalent because the first will consume 3 bits of
the
input stream where the second will not advance the input stream.
+
```xml
<xs:element name="a" type="xs:unsignedInt" dfdl:length="3"
dfdl:lengthUnits="bits" />
@@ -81,51 +107,296 @@ In this case the choice of elements `a` vs. `b` depends
on the value of the `tag
found after fields `a` and `b`:
```
<xs:choice dfdl:choiceDispatchKey="{ dfdlx:lookAhead(16,8) }">
-<xs:element name="a" type="xs:int" dfdl:length="16" dfdl:choiceBranchKey="1"/>
-<xs:element name="b" type="xs:int" dfdl:length="16" dfdl:choiceBranchKey="2"/>
+ <xs:element name="a" type="xs:int" dfdl:length="16"
dfdl:choiceBranchKey="1"/>
+ <xs:element name="b" type="xs:int" dfdl:length="16"
dfdl:choiceBranchKey="2"/>
</xs:choice>
<xs:element name="tag" type="xs:int" dfdl:length="8" />
- ```
-# Bitwise Functions
+```
-TBD, but the complete list (all ``dfdlx``) is `BitAnd`, `BitNot`, `BitOr`,
`BitXor`, `LeftShift`,
-`RightShift`
+## Bitwise Functions: `bitAnd`, `bitOr`, `bitXor`, `bitNot`, `leftShift`,
`rightShift`
-## ``dfdlx:doubleFromRawLong`` and ``dfdlx:doubleToRawLong``
+These functions are defined on types `long`, `int`, `short`, `byte`,
`unsignedLong`,
+`unsignedInt`, `unsignedShort`, and `unsignedByte`
-Converting binary floating point numbers to/from base 10 text can result in
lost information.
-The base 10 representation, converted back to binary representation, may not
be bit-for-bit
-identical. These functions can be used to carry 8-byte double precision IEEE
floating point
-numbers as type `xs:long` so that no information is lost. The DFDL schema can
still obtain
-and operate on the floating point value by converting these `xs:long` values
into type
-`xs:double`, and back if necessary for unparsing a new value.
+### `dfdlx:bitAnd(arg1, arg2)`
-# Properties
+This computes the bitwise AND of two integers.
-## ``dfdlx:parseUnparsePolicy``
+- Both arguments must be signed, or both must be unsigned.
+- If the two arguments are not the same type the smaller one is converted into
the type of the
+larger one.
+- If the smaller argument is signed, this conversion does sign-extension.
+- The result type is the that of the largest argument.
-A property applied to simple and complex elements, which specifies whether the
element supports only parsing, only unparsing, or both parsing and unparse.
Valid values for this property are ``parse``, ``unparse``, or ``both``. This
allows one to leave off properties that are required for only parse or only
unparse, such as ``dfdl:outputValueCalc`` or ``dfdl:outputNewLine``, so that
one may have a valid schema if only a subset of functionality is needed.
+### `dfdlx:bitOr(arg1, arg2)`
-All elements must have a compatible parseUnparsePolicy with the compilation
parseUnparsePolicy (which is defined by the root element daf:parseUnparsePolicy
and/or the Daffodil parseUnparsePolicy tunable) or it is a Schema Definition
Error. An element is defined to have a compatible parseUnparsePolicy if it has
the same value as the compilation parseUnparsePolicy or if it has the value
``both``.
+This computes the bitwise OR of two integers.
-For compatibility, if this property is not defined, it is assumed to be
``both``.
+- Both arguments must be signed, or both must be unsigned.
+- If the two arguments are not the same type the smaller one is converted into
the type of the
+larger one.
+- If the smaller argument is signed, this conversion does sign-extension.
+- The result type is the that of the largest argument.
+
+### `dfdlx:bitXor(arg1, arg2)`
+
+This computes the bitwise Exclusive OR of two integers.
+
+- Both arguments must be signed, or both must be unsigned.
+- If the two arguments are not the same type the smaller one is converted into
the type of the
+larger one.
+- If the smaller argument is signed, this conversion does sign-extension.
+- The result type is the that of the largest argument.
+
+### `dfdlx:bitNot(arg)`
+
+This computes the bitwise NOT of an integer. Every bit is inverted. The result
type is the same
+as the argument type.
-## ``dfdlx:layer``
+### `dfdlx:leftShift(value, shiftCount)`
-[Layers](/layers) provide algorithmic capabilities for decoding/encoding data
or computing
-checksums. Some are built-in to Daffodil. New layers can be created in
Java/Scala and
-plugged-in to Daffodil dynamically.
+This is the _logical_ shift left, meaning that bits are shifted from
less-significant positions
+to more-significant positions.
-## ``dfdlx:direction``
+- The left-most bits shifted out are discarded.
+- Zeros are shifted in for the right-most bits.
+- The result type is the same as the `value` argument type.
+- It is a processing error if the `shiftCount` argument is < 0.
+- It is a processing error if the `shiftCount` argument is greater than the
number of
+ bits in the type of the value argument.
-TBD
+### `dfdlx:rightShift(value, shiftCount)`
-## ``dfdlx:repType``, ``dfdlx:repValues``, and ``dfdlx:repValueRanges``
+This is the _arithmetic_ shift right, meaning bits move from most-significant
to
+less-significant positions.
+If _logical_ (zero-filling) shift right is needed, you must use unsigned types.
-TBD
+- The `value` argument is shifted by the `shiftCount`.
+- The right-most bits shifted out are discarded.
+- If the `value` is signed, then the sign bit is shifted in for the left-most
bits.
+- If the `value` is unsigned, then zeros are shifted in for the left-most
bits.
+- The result type is the same as the `value` argument type.
+- It is a processing error if the `shiftCount` argument is < 0.
+- It is a processing error if the `shiftCount` argument is greater than the
number of
+ bits in the type of the value argument.
-# Extended Behaviors
+## `dfdlx:doubleFromRawLong(longArg)` and `dfdlx:doubleToRawLong(doubleArg)`
+
+IEEE binary float and double values that are not NaN will parse to base 10
text and unparse back
+to the same exact IEEE binary bits.
+However, the same cannot be said for NaN (not a number) values, of which there
are many bit
+patterns.
+To preserve float and double NaN values bit for bit you can use these
functions to compute
+`xs:long` values that enable the DFDL Infoset to preserve the bits of a float
or double value
+even if it is a NaN.
+
+# Properties
+
+## `dfdlx:alignmentKind`
+
+Valid values for this property are `manual` or `automatic` with `automatic`
being the default
+behavior.
+When specified, the `manual` value turns off all automatic alignment based on
the
+`dfdl:alignment` and `dfdl:alignmentUnits` properties.
+The schema author must use `dfdl:leadingSkip`, `dfdl:trailingSkip`, or just
ensure all the
+elements/terms are aligned based on their length.
+
+This property is sometimes needed to facilitate creation of schemas where
interactions occur
+between computed lengths (that is, stored length fields) and
+alignment regions that are automatically being inserted.
+It can be easier to do all alignment manually than to debug these
interactions.
+
+## `dfdlx:parseUnparsePolicy`
+
+A property applied to simple and complex elements, which specifies whether the
element supports only parsing, only unparsing, or both parsing and unparse.
Valid values for this property are ``parse``, ``unparse``, or ``both``. This
allows one to leave off properties that are required for only parse or only
unparse, such as ``dfdl:outputValueCalc`` or ``dfdl:outputNewLine``, so that
one may have a valid schema if only a subset of functionality is needed.
+
+All elements must have a compatible parseUnparsePolicy with the compilation
parseUnparsePolicy (which is defined by the root element daf:parseUnparsePolicy
and/or the Daffodil parseUnparsePolicy tunable) or it is a Schema Definition
Error. An element is defined to have a compatible parseUnparsePolicy if it has
the same value as the compilation parseUnparsePolicy or if it has the value
``both``.
+
+For compatibility, if this property is not defined, it is assumed to be
``both``.
+
+## `dfdlx:layer`
+
+_Layers_ provide algorithmic capabilities for decoding/encoding data or
computing
+ checksums. Some are built-in to Daffodil. New layers can be created in
Java/Scala and
+ plugged-in to Daffodil dynamically.
+There is [separate Layer documentation](/layers).
+
+## `dfdlx:direction`
+
+This property can appear only on DFDL `defineVariable` statement annotations.
+This property has possible values `both` (the default), `parseOnly`, or
`unparseOnly`.
+It declares
+whether the variable is to be available for only parsing, only unparsing, or
both.
+Since this is a newly introduced extension property and existing schemas won't
contain a definition
+for it, it has a default value of `both`.
+
+This property can conflict with the `dfdlx:parseUnparsePolicy` property which
takes the same
+values (`both`, `parseOnly`, and `unparseOnly`).
+If `dfdlx:parseUnparsePolicy='parseOnly'` then it is a Schema Definition Error
if
+variables in the DFDL schema have `dfdlx:direction='unparseOnly'`.
+Similarly if `dfdlx:parseUnparsePolicy='unparseOnly'` then it is a Schema
Definition Error if
+variables in the DFDL schema have `dfdlx:direction='parseOnly'`.
+
+It is a Schema Definition Error if a variable defined with direction
`parseOnly` is accessed
+from an expression used by the unparser.
+Symmetrically, it is a Schema Definition Error if a variable defined with
direction
+`unparseOnly` is accessed from an expression used by the parser.
+This error is detected at DFDL schema compilation time, not runtime.
+
+These properties take expressions for their values and are generally evaluated
at both parse and
+unparse time.
+Hence, unless the whole schema is constrained by `dfdlx:parseUnparsePolicy`,
any expressions for
+these properties[^moreProps] cannot
+cannot reference DFDL variables with `dfdlx:direction` of `parseOnly` or
`unparseOnly`.
+
+- `byteOrder`
+- `encoding`
+- `initiator`
+- `terminator`
+- `separator`
+- `escapeCharacter`
+- `escapeEscapeCharacter`
+- `length`
+- `occursCount`
+- `textStandardDecimalSeparator`
+- `textStandardGroupingSeparator`
+- `textStandardExponentRep`
+- `binaryFloatRep`
+- `textBooleanTrueRep`
+- `textbooleanFalseRep`
+- `calendarLanguage`
+- `dfdl:setVariable`, a `dfdl:newVariableInstance` default value expression,
or a
+ `dfdl:defineVariable` default value expression when
+ that variable being set/defaulted is itself referenced from a another
expression and the variable
+ being set/defaulted has `dfdlx:direction` of `both` (the default)
+
+<!-- footnotes must be all one big long line -->
+[^moreProps]: New properties added as part of errata corrections to the DFDL
v1.0 standard which take expressions for their values will need to be added to
this list or those for parser-specific or unparser-specific properties.
+
+Parser-specific expressions include
+
+- `dfdl:inputValueCalc`
+- `dfdl:length` (when dfdl:lengthKind='explicit')
+- `dfdl:occursCount` (when `dfdl:occursCountKind='expression')
+- `dfdl:choiceDispatchKey`
+- the `message` and `test` attributes of the `dfdl:assert` and
`dfdl:discriminator` statement annotations
+- `dfdl:setVariable`, a `dfdl:newVariableInstance` default value expression,
or a
+ `dfdl:defineVariable` default value expression when
+ that variable being set/defaulted is itself referenced from a another
expression being
+ accessed at parser creation time, and the variable being set/defaulted has
`dfdlx:direction`
+ of `parseOnly`
+
+Unparser-specific expressions include:
+
+- `dfdl:outputValueCalc`
+- `dfdl:length` (when `dfdl:lengthKind='explicit')
+- `dfdl:outputNewLine`
+- `dfdl:setVariable`, a `dfdl:newVariableInstance` default value expression,
or a
+ `dfdl:defineVariable` default value expression when
+ that variable being set/defaulted is itself referenced from a another
expression being
+ accessed at unparser creation time, and the variable being set/defaulted has
`dfdlx:direction`
+ of `unparseOnly`
+
+
+## Enumerations: `dfdlx:repType`, `dfdlx:repValues`
+
+These properties work together to allow DFDL schemas to define _enumerations_;
+that is, symbolic representations for integer constants.
+When parsing, Daffodil will convert these integers into the corresponding
string values.
+When unparsing, Daffodil will convert strings into the corresponding integers.
+
+An element of type (or derived from) `xs:string` can be defined using XSD
`enumeration` facets
+which constrain the valid values of this string.
+These enumeration values are effectively symbolic constants.
+The `dfdlx:repType` and `dfdlx:repValues` properties are then used to define
the correspondence of
+the symbolic strings to the corresponding integer values.
+
+### `dfdlx:repType`
+
+The value of this property is an XSD QName of a simple type definition that
must be derived
+from `xs:int`, or `xs:unsignedInt`.
+A simple type definition for a string can be annotated with `dfdlx:repType`
+in order to declare that the representation of the string is not as text
characters but is a
+numeric integer value.
+The type referenced from `dfdlx:repType` is usually a fixed length binary
integer, but can be any
+DFDL type derived from `xs:int` or `xs:unsignedInt`, with any DFDL
representation properties.
+
+The mapping between the representation integer and the symbolic constants is
specified using the
+`dfdlx:repValues` property.
+
+### `dfdlx:repValues`
+
+The value of this property is one or more integer values within
+the numeric range defined for the type referenced by `dfdlx:repType`. When
more than one value
+is specified, they are in a whitespace separated list.
+
+This property is placed on the `xs:enumeration` facets of a symbolic string
constant having a
+`dfdlx:repType`.
+At parse time, if the value of the `dfdlx:repType` integer is found within the
`dfdlx:repValues`
+list, then the infoset value for the symbolic string gets the corresponing
enumeration facet value.
+It is a parse error if the `dfdlx:repType` integer is not found in any of the
`dfdlx:repValues`
+lists of the `xs:enumeration` facets.
+At unparse time, the symbolic constant is mapped to the first integer in the
`dfdlx:repValues` list.
+It is an unparse error if the symbolic string value is not found among the
`xs:enumeration`
+facet values of the symbolic string type.
+
+### Examples of Enumerations in Daffodil DFDL
+
+A simple example of a basic enum is:
+
+```xsd
+ <simpleType name="rep3Bit" dfdl:lengthUnits="bits" dfdl:length="3"
dfdl:lengthKind="explicit">
+ <restriction base="xs:unsignedInt"/>
+ </simpleType>
+
+ <simpleType name="precedenceEnum" dfdlx:repType="pre:rep3Bit">
+ <restriction base="xs:string">
+ <enumeration value="Reserved_0" dfdlx:repValues="0"/>
+ <enumeration value="Reserved_1" dfdlx:repValues="1"/>
+ <enumeration value="Emergency" dfdlx:repValues="2"/>
+ <enumeration value="Reserved_3" dfdlx:repValues="3"/>
+ <enumeration value="Flash" dfdlx:repValues="4"/>
+ <enumeration value="Immediate" dfdlx:repValues="5"/>
+ <enumeration value="Priority" dfdlx:repValues="6"/>
+ <enumeration value="Routine" dfdlx:repValues="7"/>
+ </restriction>
+ </simpleType>
+ ```
+
+Above we see the `dfdlx:repType` is `rep3Bit` which is a 3 bit
`xs:unsignedInt`. This can
+represent the values 0 to 7 which one can see are the `dfdlx:repValues` of the
`xs:enumeration`
+facets for this enumeration string type which is named `precedenceEnum`.
+
+In the above you can also see that the symbolic strings are in one-to-one
correspondence with
+every possible value of the 3-bit representation integer.
+This one-to-one correspondence assures that data that is first parsed and then
unparsed will
+recreate the exact numeric bits used.
+
+However, in data security applications the following may be preferred:
+```xsd
+ <simpleType name="precedenceEnum" dfdlx:repType="pre:rep3Bit">
+ <restriction base="xs:string">
+ <enumeration value="Reserved" dfdlx:repValues="0 1 3"/>
+ <enumeration value="Emergency" dfdlx:repValues="2"/>
+ <enumeration value="Flash" dfdlx:repValues="4"/>
+ <enumeration value="Immediate" dfdlx:repValues="5"/>
+ <enumeration value="Priority" dfdlx:repValues="6"/>
+ <enumeration value="Routine" dfdlx:repValues="7"/>
+ </restriction>
+ </simpleType>
+```
+
+In the above we see that three numeric values, 0, 1, and 3 are the
`dfdlx:repValues` mapped to
+the symbolic string `Reserved`.
+This technique has the advantage of blocking covert signals being transmitted
by use of the
+different reserved values since when unparsed, the constant string `Reserved`
will always be
+_canonicalized_ to integer 0.
+Putting data into canonical form when unparsing generally improves data
security.
+
+# Extended Behaviors for DFDL Types
## Type ``xs:hexBinary``
Daffodil allows `dfdlx:lengthUnits='bits'` for this simple type.
+
+----
diff --git a/site/layers.md b/site/layers.md
index db79eb2..373f70d 100644
--- a/site/layers.md
+++ b/site/layers.md
@@ -3,8 +3,7 @@ description: Pluggable Extensions to Enable Algorithmic
Transformations
in DFDL
group: 'nav-right'
layout: page
-title: 'Layers - Algorithmic
- Extensions for DFDL'
+title: 'Layers - Algorithmic Extensions for DFDL'
---
<!--
{% comment %}
@@ -24,12 +23,13 @@ See the License for the specific language governing
permissions and
limitations under the License.
{% endcomment %}
-->
-
## Table of Contents
{:.no_toc}
+<!-- The {: .no_toc } excludes the above heading from the ToC -->
-1. use ordered table of contents
+1. yes, this is the standard Jekyll way to do a ToC (this line gets removed)
{:toc}
+<!-- note the above line {:toc} cannot have whitespace at the start -->
# Introduction
@@ -58,7 +58,7 @@ There is no limit to this depth.
In the section on [Using Layers](#UsingLayers) below we will look at an
example that uses
multiple layers together.
-# Built-in Layers
+## Built-in Layers
Daffodil includes several built-in layers:
- [base64_MIME](#base64-mime-layer)
@@ -78,20 +78,20 @@ Each of the built-in layers will be
[documented separately below](#daffodil-built-in-layer-documentation) with
examples of their
usage.
-# Custom Plug-In Layers
+## Custom Plug-In Layers
Additional layers can be written in Java or Scala and deployed as _plug-ins_
for Daffodil.
These are generally packaged as DFDL _layer schemas_, a kind of _component
schema_,
that provide the layer packaged for import by other DFDL _assembly_ schemas
that use the
layer in the data format they describe.
-# Layer Kinds: Transforming Layers and Checksum Layers
+## Layer Kinds: Transforming Layers and Checksum Layers
There are two different kinds of layers, though they share many
characteristics. They are
_transforming_ layers, and _checksum_ layers. Both run small algorithms over
part (or all) of
the data stream. The difference is the purpose of the algorithm and its
output.
-## Transforming Layers
+### Transforming Layers
These layers decode data (when parsing), and encode data (when unparsing).
The simplest example of a transforming layer is the `base64_MIME` layer which
@@ -106,7 +106,7 @@ Custom transforming layers are created by deriving an
implementation from the Da
[`Layer`](/docs/latest/javadoc/org/apache/daffodil/runtime1/layers/api/Layer.html)
class
which is introduced in a later section.
-## Checksum Layers
+### Checksum Layers
Checksum layers are a simplified kind of layer which do not decode or encode
data, they simply
pass through the data unmodified, but while doing so they compute a checksum,
hash, or Cyclic
@@ -129,6 +129,8 @@ Custom checksum layers are created by deriving an
implementation class from the
html)
class, which is introduced in a later section.
+----
+
# Using Layers
To use a layer you must know
@@ -201,9 +203,7 @@ Layers may specify restrictions on the minimum and maximum
allowed values of the
and passing an out-of-range value for the variable is a processing error.
-# Examples
-
-## Line Folding
+# Example: Line Folding
Consider the line folding layer, specifically the `lineFolded_IMF` layer,
which is built-in to Daffodil.
@@ -220,7 +220,7 @@ Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod
```
This data has been _line folded_ at roughly 72 characters by inserting a CRLF
before an existing space in the data.
-Each line ends with a CRLF (\r\n) and the second through fourth lines begin
+Each line ends with a CRLF (`\r\n`) and the second through fourth lines begin
with a space as a way of indicating that they are extension lines.
This data is supposed to be reassembled to form a long single-line string by
removing
all CRLF pairs.
@@ -265,7 +265,7 @@ Other examples will show how the layer length can be
limited to a sub-region of
More detailed documentation for the [Line Folded Layers](#line-folded-layers)
is below.
-## Base64, GZip, and BoundaryMark Layers used Together
+# Example: Base64, GZip, and BoundaryMark Layers used Together
In this example, the data consists of a preliminary string, a section of
CSV-like data, and a
final string element.
@@ -433,6 +433,8 @@ This group definition is the last thing in the schema:
```
The above schema works both to parse, but also to unparse this data.
+----
+
# Using Custom Plug-In Layers
A custom plug-in layer is used in the same manner as the built-in Daffodil
layers with just a few
@@ -463,6 +465,9 @@ base class.
Further details on how to define custom plug-in layers is in the Javadoc for
the
[Layer
API](/docs/latest/javadoc/org/apache/daffodil/runtime1/layers/api/package-summary.html)
+----
+----
+
# Daffodil Built-In Layer Documentation
Each of the layers built-in to the Daffodil implementation are documented in a
section below
@@ -476,6 +481,8 @@ The built-in layers are:
- [lineFolded_IMF](#line-folded-layers)
- [lineFolded_iCalendar](#line-folded-layers)
+----
+
## Base64 MIME Layer
- Name: base64_MIME
@@ -491,11 +498,13 @@ This uses the standard `java.util.Base64` classes,
specifically the MIME encodin
This is specified by [RFC 2045](https://www.ietf.org/rfc/rfc2045.txt).
The encoded output must be represented in lines of no more than 76 characters
-each and uses a carriage return '\r' followed immediately by a linefeed '\n'
as the line separator.
+each and uses a carriage return `\r` followed immediately by a linefeed `\n`
as the line separator.
No line separator is added to the end of the encoded output.
All line separators or other characters not found in the base64 alphabet table
are ignored in
decoding operation.
+----
+
## BoundaryMark Layer
- Name: boundaryMark
@@ -537,6 +546,9 @@ of any child element enclosed within the layer, or even the
lengths of other lay
within the scope of this boundary mark layer are not considered and do not
disrupt the search
for the boundary mark string.
+
+----
+
## Byte-Swapping Layers
- Layer Names:
@@ -570,6 +582,8 @@ order 2 1 4 3 6 5 8 7 10 9.
If `requireLengthInWholeWords` is bound to "yes", then if the length is not a
multiple of the
word size a processing error occurs.
+----
+
## FixedLength Layer
- Name: fixedLength
@@ -588,6 +602,8 @@ word size a processing error occurs.
Suitable only for small sections of data, not large data streams or large
files.
The entire fixed length region of the data will be pulled into a byte buffer
in memory.
+----
+
## GZIP Layer
- Name: gzip
@@ -610,6 +626,8 @@ depending on the Java version used.
To avoid inconsistent behavior of test failures that expect a certain byte
value this layer
always writes a consistent header (header byte 9 of 255) regardless of the
Java version.
+----
+
## Line Folded Layers
- Layer Names:
@@ -624,7 +642,6 @@ always writes a consistent header (header byte 9 of 255)
regardless of the Java
<xs:import namespace="urn:org.apache.daffodil.layers.lineFolded"
schemaLocation="/org/apache/daffodil/layers/xsd/lineFoldedLayer.dfdl.xsd"/>
```
-
### General Usage
There is a limitation on the compatibility of line folding of data
@@ -633,7 +650,7 @@ For example, line folding can interact badly with
surrounding elements of `dfdl:
'pattern'` if the pattern is, for example `".*?\\r\\n(?!(?:\\t|\\ ))"` which
is anything up to
and including a CRLF not followed by a space or tab.
The problem is that line folding
-converts isolated \n or \r into \r\n, and if this just happens to be followed
by a
+converts isolated `\n` or `\r` into `\r\n`, and if this just happens to be
followed by a
non space/tab character this will have inserted an end-of-data in the middle
of the
data.
diff --git a/site/unsupported.md b/site/unsupported.md
index f85faf3..4d2db35 100644
--- a/site/unsupported.md
+++ b/site/unsupported.md
@@ -51,7 +51,7 @@ that there has been no intention to support as of this
release.
# XML Schema Features
* fixed {% jira 117 %}
-* default {% jira 115 %} {% jira 1277 %}
+* default {% jira 115 %}
# Properties and Property Enumerations