(plc4x) branch develop updated: docs: Updated some documentation on the code-generation.

cdutz Fri, 22 Mar 2024 14:23:09 -0700

This is an automated email from the ASF dual-hosted git repository.

cdutz pushed a commit to branch develop
in repository https://gitbox.apache.org/repos/asf/plc4x.git



The following commit(s) were added to refs/heads/develop by this push:
     new 099ca20d09 docs: Updated some documentation on the code-generation.
099ca20d09 is described below

commit 099ca20d0998974559bbc2a75bef3458cb7e56b5
Author: Christofer Dutz <cd...@apache.org>
AuthorDate: Fri Mar 22 22:22:50 2024 +0100

    docs: Updated some documentation on the code-generation.
---
 src/site/asciidoc/developers/code-gen/index.adoc   | 111 +++++----
 .../developers/code-gen/protocol/mspec.adoc        | 263 +++++++++++++++------
 2 files changed, 260 insertions(+), 114 deletions(-)

diff --git a/src/site/asciidoc/developers/code-gen/index.adoc 
b/src/site/asciidoc/developers/code-gen/index.adoc
index 92726df6fa..83019d265f 100644
--- a/src/site/asciidoc/developers/code-gen/index.adoc
+++ b/src/site/asciidoc/developers/code-gen/index.adoc
@@ -54,17 +54,17 @@ The `Types Base` module provides all the structures the 
`Protocol` modules outpu
 
 `Protocol Base` and `Language Base` hereby just provide the interfaces that 
reference these types and provide the API for the `plc4x-maven-plugin` to use.
 
-These modules are also maintained in a repository which is separate from the 
rest of the PLC4X code.
+These modules are also maintained in a 
link:https://github.com/apache/plc4x-build-tools/tree/develop/code-generation[repository]
 which is separate from the rest of the PLC4X code.
 
-This is due to some restrictions in the Maven build system. If you are 
interested in understanding the reasons - please read the chapter on `Problems 
with Maven` near the end of this page.
+This is generally only due to some restrictions in the Maven build system. If 
you are interested in understanding the reasons - please read the chapter on 
`Problems with Maven` near the end of this page.
 
-Concrete protocol spec parsers and templates that actually generate code are 
implemented in derived modules.
+Concrete 
link:https://github.com/apache/plc4x/tree/develop/code-generation/protocol-base-mspec[protocol
 spec parsers], 
link:https://github.com/apache/plc4x/tree/develop/code-generation/language-base-freemarker[code
 generators] as well as 
link:https://github.com/apache/plc4x/tree/develop/code-generation/language-java[templates]
 that actually generate code are implemented in derived modules all located 
under the 
link:https://github.com/apache/plc4x/tree/develop/code-generation[code-generat 
[...]
 
-We didn't want to tie ourselves to only one way to specify protocols and to 
generate code. Generally multiple types of formats for specifying drivers are 
thinkable and the same way multiple ways of generating code are possible. 
Currently however we only have one parser: `MSpec` and one generator: 
`Freemarker`.
+We didn't want to tie ourselves to only one way to specify protocols and to 
generate code. Generally multiple types of formats for specifying drivers are 
thinkable and the same way, multiple ways of generating code are possible. 
Currently, however we only have one parser: `MSpec` and one generator: 
`Freemarker`.
 
 These add more layers to the hierarchy.
 
-So for example in case of generating a Siemens S7 Driver for Java this would 
look like this:
+So for example in case of generating a `Siemens S7` Driver for `Java` this 
would look like this:
 
 [ditaa,code-generation-intro-s7-java]
 ....
@@ -129,14 +129,17 @@ So in general it is possible to add new forms of 
providing protocol definitions
 For the formats of specifying a protocol we have tried out numerous tools and 
frameworks, however the results were never quite satisfying.
 
 Usually using them required a large amount of workarounds, which made the 
solution quite complicated.
+This is mainly the result, that tools like Thrift, Avro, GRPc, ... all are 
made for transferring an object structure from A to B. They lay focus on 
keeping the structure of the object in takt and not offer ways to control the 
format for transferring them.
 
-In the end only DFDL and the corresponding Apache project 
https://daffodil.apache.org[Apache Daffodil] seemed to provide what we were 
looking for.
+Existing industry standards, such as `ASN.1` unfortunately mostly relied on 
large portions of text to describe part of the parsing or serializing logic, 
which made it pretty much useless for a fully automated code genration.
+
+In the end only `DFDL` and the corresponding Apache project 
link:https://daffodil.apache.org[Apache Daffodil] seemed to provide what we 
were looking for.
 
 With this we were able to provide first driver versions fully specified in XML.
 
-The downside was, that the PLC4X community regarded this XML format as pretty 
complicated and when implementing an experimental code generator we quickly 
noticed that generating a nice object model would not be possible, due to the 
lack ability to model the inheritance of types in DFDL.
+The downside was, that the PLC4X community regarded this XML format as pretty 
complicated and when implementing an experimental code generator we quickly 
noticed that generating a nice object model would not be possible, due to the 
lack of an ability to model inheritance of types into a DFDL schema.
 
-In the end we came up with our own solution which we called `MSpec` and is 
described in the link:protocol/mspec.html[MSpec Format description].
+In the end we came up with our own format which we called `MSpec` and is 
described in the link:protocol/mspec.html[MSpec Format description].
 
 === Configuration
 
@@ -144,19 +147,28 @@ The `plc4x-maven-plugin` has a very limited set of 
configuration options.
 
 In general all you need to specify, is the `protocolName` and the 
`languageName`.
 
-An additional option `outputFlavor` allows generating multiple versions of a 
driver for a given language. This can come in handy if we want to be able to 
generate `read-only` or `passive mode` driver variants.
+An additional option `outputFlavor` allows generating multiple versions of a 
driver for a given language.
+This can come in handy if we want to be able to generate `read-only` or 
`passive mode` driver variants.
+
+In order to be able to refactor and improve protocol specifications without 
having to update all drivers for a given protocol, we recently added a 
`protocolVersion` attribute, that allows us to provide and use multiple 
versions of one protocol.
+So in case of us updating the fictional `wombat-protocol`, we could add a 
`version 2` `mspec` for that, then use the version 2 in the java-driver and 
continue to use version 1 in all other languages.
+Once all drivers are updated we could eliminate the version again.
 
 Last, not least, we have a pretty generic `options` config option, which is a 
Map type.
 
-With options is it possible to pass generic options to the code-generation. So 
if a driver or language requires further customization, these options can be 
used.
+With options is it possible to pass generic options to the code-generation.
+So if a driver or language requires further customization, these options can 
be used.
+For a list of all supported options for a given language template, please 
refer to the corresponding language page.
 
 Currently, the `Java` module makes use of such an option for specifying the 
Java `package` the generated code uses.
-If no `package` option is provided, the default package 
`org.apache.plc4x.{language-name}.{protocol-name}.{output-flavor}` is used, but 
especially when generating custom drivers, which are not part of the Apache 
PLC4X project, different package names are better suited. So in these cases, 
the user can simply override the default package name.
+If no `package` option is provided, the default package 
`org.apache.plc4x.{language-name}.{protocol-name}.{output-flavor}` is used, but 
especially when generating custom drivers, which are not part of the Apache 
PLC4X project, different package names are better suited.
+So in these cases, the user can simply override the default package name.
 
 There is also an additional parameter: `outputDir`, which defaults to 
`${project.build.directory}/generated-sources/plc4x/` and usually shouldn't 
require being changed in case of a `Java` project, but usually requires 
tweaking when generating code for other languages.
 
 Here's an example of a driver pom for building a `S7` driver for `java`:
 
+[subs=attributes+]
 ....
 <?xml version="1.0" encoding="UTF-8"?>
 <!--
@@ -185,7 +197,7 @@ Here's an example of a driver pom for building a `S7` 
driver for `java`:
   <parent>
     <groupId>org.apache.plc4x.plugins</groupId>
     <artifactId>plc4x-code-generation</artifactId>
-    <version>0.6.0-SNAPSHOT</version>
+    <version>{current-last-released-version}</version>
   </parent>
 
   <artifactId>test-java-s7-driver</artifactId>
@@ -217,13 +229,13 @@ Here's an example of a driver pom for building a `S7` 
driver for `java`:
     <dependency>
       <groupId>org.apache.plc4x.plugins</groupId>
       <artifactId>plc4x-code-generation-driver-base-java</artifactId>
-      <version>0.6.0-SNAPSHOT</version>
+      <version>{current-last-released-version}</version>
     </dependency>
 
     <dependency>
       <groupId>org.apache.plc4x.plugins</groupId>
       <artifactId>plc4x-code-generation-language-java</artifactId>
-      <version>0.6.0-SNAPSHOT</version>
+      <version>{current-last-released-version}</version>
       <!-- Scope is 'provided' as this way it's not shipped with the driver -->
       <scope>provided</scope>
     </dependency>
@@ -231,7 +243,7 @@ Here's an example of a driver pom for building a `S7` 
driver for `java`:
     <dependency>
       <groupId>org.apache.plc4x.plugins</groupId>
       <artifactId>plc4x-code-generation-protocol-s7</artifactId>
-      <version>0.6.0-SNAPSHOT</version>
+      <version>{current-last-released-version}</version>
       <!-- Scope is 'provided' as this way it's not shipped with the driver -->
       <scope>provided</scope>
     </dependency>
@@ -244,33 +256,42 @@ So the plugin configuration is pretty straight forward, 
all that is specified, i
 
 The dependency:
 
+[subs=attributes+]
+....
     <dependency>
       <groupId>org.apache.plc4x.plugins</groupId>
       <artifactId>plc4x-code-generation-driver-base-java</artifactId>
-      <version>0.6.0-SNAPSHOT</version>
+      <version>{current-last-released-version}</version>
     </dependency>
+....
 
 For example contains all classes the generated code relies on.
 
 The definitions of both the `s7` protocol and `java` language are provided by 
the two dependencies:
 
+[subs=attributes+]
+....
     <dependency>
       <groupId>org.apache.plc4x.plugins</groupId>
       <artifactId>plc4x-code-generation-language-java</artifactId>
-      <version>0.6.0-SNAPSHOT</version>
+      <version>{current-last-released-version}</version>
       <!-- Scope is 'provided' as this way it's not shipped with the driver -->
       <scope>provided</scope>
     </dependency>
+....
 
 and:
 
+[subs=attributes+]
+....
     <dependency>
       <groupId>org.apache.plc4x.plugins</groupId>
       <artifactId>plc4x-code-generation-protocol-s7</artifactId>
-      <version>0.6.0-SNAPSHOT</version>
+      <version>{current-last-released-version}</version>
       <!-- Scope is 'provided' as this way it's not shipped with the driver -->
       <scope>provided</scope>
     </dependency>
+....
 
 The reason for why the dependencies are added as code-dependencies and why the 
scope is set the way it is, is described in the <<Why are the protocol and 
language dependencies done so strangely?>> section.
 
@@ -282,15 +303,14 @@ The plugin uses the 
https://docs.oracle.com/javase/7/docs/api/java/util/ServiceL
 
 In order to provide a new protocol module, all that is required, it so create 
a module containing a 
`META-INF/services/org.apache.plc4x.plugins.codegenerator.protocol.Protocol` 
file referencing an implementation of the 
`org.apache.plc4x.plugins.codegenerator.protocol.Protocol` interface.
 
-This interface is located in the 
`org.apache.plc4x.plugins:plc4x-code-generation-protocol-base` module and 
generally only defines two methods:
+This interface is located in the 
`org.apache.plc4x.plugins:plc4x-code-generation-protocol-base` module and 
generally only defines three methods:
 
 ....
 package org.apache.plc4x.plugins.codegenerator.protocol;
 
-import 
org.apache.plc4x.plugins.codegenerator.types.definitions.ComplexTypeDefinition;
 import 
org.apache.plc4x.plugins.codegenerator.types.exceptions.GenerationException;
 
-import java.util.Map;
+import java.util.Optional;
 
 public interface Protocol {
 
@@ -302,28 +322,35 @@ public interface Protocol {
     String getName();
 
     /**
-     * Returns a map of complex type definitions for which code has to be 
generated.
+     * Returns a map of type definitions for which code has to be generated.
      *
      * @return the Map of types that need to be generated.
      * @throws GenerationException if anything goes wrong parsing.
      */
-    Map<String, TypeDefinition> getTypeDefinitions() throws 
GenerationException;
+    TypeContext getTypeContext() throws GenerationException;
+
+
+    /**
+     * @return the protocolVersion is applicable
+     */
+    default Optional<String> getVersion() {
+        return Optional.empty();
+    }
 
 }
 ....
 
-These implementations could use any form of way to generate the Map of 
`ComplexTypeDefinition`'s.
-They could even be hard coded.
+The `name` is being used for the module to find the right language module, so 
the result of `getName()` needs to match the value provided in the maven 
config-option `protocolName`.
 
-However, we have currently implemented utilities for universally providing 
input:
+As mentioned before, we support multiple versions of a protocol, so if 
`getVersions()` returns a non-empty version, this is used to select the version.
 
-- link:protocol/mspec.html[MSpec Format] PLC4X proprietary format.
+The most important method for the actual code-generation however is the 
`getTypeContext()` method, which returns a `TypeContext` type which generally 
contains a list of all parsed types for this given protocol.
 
 ==== Language Modules
 
-Analog to the <<Protocol Modules>> the Language modules are constructed 
equally.
+Analog to the <<Protocol Modules>> the Language modules are constructed very 
similar.
 
-The `Language` interface is very simplistic too and is located in the 
`org.apache.plc4x.plugins:plc4x-code-generation-language-base` module and 
generally only defines two methods:
+The `LanguageOutput` interface is very simplistic too and is located in the 
`org.apache.plc4x.plugins:plc4x-code-generation-language-base` module and 
generally only defines four methods:
 
 ....
 package org.apache.plc4x.plugins.codegenerator.language;
@@ -353,7 +380,7 @@ public interface LanguageOutput {
      */
     Set<String> supportedOptions();
 
-    void generate(File outputDir, String languageName, String protocolName, 
String outputFlavor,
+    void generate(File outputDir, String version, String languageName, String 
protocolName, String outputFlavor,
         Map<String, TypeDefinition> types, Map<String, String> options) throws 
GenerationException;
 
 }
@@ -361,9 +388,11 @@ public interface LanguageOutput {
 
 The file for registering Language modules is located at: 
`META-INF/services/org.apache.plc4x.plugins.codegenerator.language.LanguageOutput`
 
-Same as with the protocol modules, the language modules could also be 
implemented in any thinkable way, however we have already implemented some 
helpers for using:
+The `name` being used by the plugin to find the language output module defined 
by the maven config option `languageName`.
+
+`supportedOutputFlavors` provides a possible list of flavors, that can be 
referred to by the maven config option `outputFlavor`.
 
-- link:language/freemarker.html[Apache Freemarker Format] Generate output 
using https://freemarker.apache.org[Apache Freemarker] Project.
+`supportedOptions` provides a list of `options` that the current language 
module is able to use and which can be passed in to the maven configuration 
using the `options` settings.
 
 === Problems with Maven
 
@@ -376,34 +405,34 @@ This is due to some restrictions in Maven, which result 
from the way Maven gener
 The main problem is that when starting a build, in the `validate`-phase, Maven 
goes through the configuration, downloads the plugins and configures these.
 This means that Maven also tries to download the dependencies of the plugins 
too.
 
-In case of using a Maven plugin in a project which also produces the maven 
plugin, this is guaranteed to fail - Especially during releases.
-While during normal development, Maven will probably just download the latest 
`SNAPSHOT` from our Maven repository and be happy with this and not complain 
that this version will be overwritten later on in the build.
+In case of using a Maven plugin in a project which also builds the maven 
plugin itself, this is guaranteed to fail - Especially during releases.
+While during normal development, Maven will probably just download the latest 
`SNAPSHOT` from our Maven repository and will be happy with this and not 
complain even if this version will be overwritten later on in the build.
 It will just use the new version as soon as it has to.
 
 During releases however the release plugin changes the version to a release 
version and then spawns a build.
-In this case the build will fail because there is no Plugin with that version 
to download.
-In this case the only option would be to manually build and install the plugin 
in the release version and to re-start the release (Which is not a nice thing 
for the release manager).
+In this case the build will fail because there is no Plugin with that version 
to download from anywhere.
+In this case the only option would be to manually build and deploy the plugin 
in the release version and to re-start the release (Which is not a nice thing 
for the release manager).
 
-For this reason we have stripped down the plugin and its dependencies to an 
absolute minimum and have released (or will release) that separately from the 
rest, hoping due to the minimality of the dependencies that we will not have to 
do it very often.
+For this reason we have stripped down the plugin and its dependencies to an 
absolute minimum and have released that separately from the rest, hoping due to 
the minimality of the dependencies that we will not have to do it very often.
 
 As soon as the tooling is released, the version is updated in the PLC4X build 
and the release version is used without any complications.
 
 ==== Why are the protocol and language dependencies done so strangely?
 
-It would certainly be a lot cleaner, if we provided the modules as plugin 
dependencies.
+It would certainly be a lot cleaner, if we provided the dependencies to 
protocol and language modules as plugin dependencies.
 
-However, as we mentioned in the previous sub-chapter, Maven tries to download 
and configure the plugins prior to running the build.
+However, as we mentioned in the previous subchapter, Maven tries to download 
and configure the plugins prior to running the build.
 So during a release the new versions of the modules wouldn't exist, this would 
cause the build to fail.
 
 We could release the protocol- and the language modules separately too, but we 
want the language and protocol modules to be part of the project, to not 
over-complicate things - especially during a release.
 
-So the Maven plugin is built in a way, that it uses the modules dependencies 
and creates its own Classloader to contain all of these modules at runtime.
+In order to keep the build and the release as simple as possible, we built the 
Maven plugin in a way, that it uses the modules dependencies and creates its 
own Classloader to contain all of these modules at runtime.
 
 This brings the benefit of being able to utilize Maven's capability of 
determining the build order and dynamically creating the modules build 
classpath.
 
 Adding a normal dependency however would make Maven deploy the artifacts with 
the rest of the modules.
 
-We don't want that as the modules are useless as soon as they have been used 
to generate the code.
+We don't want that as both the protocol as well as the language-modules are 
useless as soon as they have been used to generate the code.
 
 So we use a trick that is usually used in Web applications, for example:
 Here the vendor of a Servlet engine is expected to provide an implementation 
of the `Servlet API`.
diff --git a/src/site/asciidoc/developers/code-gen/protocol/mspec.adoc 
b/src/site/asciidoc/developers/code-gen/protocol/mspec.adoc
index 0c6050f10e..8d19b56480 100644
--- a/src/site/asciidoc/developers/code-gen/protocol/mspec.adoc
+++ b/src/site/asciidoc/developers/code-gen/protocol/mspec.adoc
@@ -20,28 +20,29 @@
 
 The `MSpec` format (Message Specification) was a result of a brainstorming 
session after evaluating a lot of other options.
 
-We simply sat down and started to write some imaginary format (`imaginary` was 
even the initial Name we used) and created parses for this afterwards and 
fine-tuned spec and parsers as part of the process of implementing first 
protocols and language templates.
+We simply sat down and started to write some imaginary format (imaginary was 
even the initial Name we used Machine-Readable SPEC = `mspec`).
+After we had an initial format that seemed to do the trick, we then stated 
creating parses for this and started iteratively fine-tuning both spec and 
parsers as part of the process of implementing new protocols and language 
templates.
 
 It's a text-based format.
 
 At the root level of these specs are a set of `type`, `discriminatedType`, 
`dataIo` and `enum` blocks.
 
-`type` elements are objects who's content is independent of the input.
+`type` elements are objects who`s content and structure is independent of the 
input.
 
-An example would be the `TPKTPacket` of the S7 format:
+An example would be the `TPKTPacket` of the `S7` format:
 
 ....
 [type TPKTPacket
-    [const    uint 8     protocolId 0x03]
-    [reserved uint 8     '0x00']
-    [implicit uint 16    len        'payload.lengthInBytes + 4']
-    [field    COTPPacket 'payload']
+    [const    uint 8                 protocolId 0x03]
+    [reserved uint 8                 '0x00']
+    [implicit uint 16                len       'payload.lengthInBytes + 4']
+    [simple   COTPPacket('len - 4') payload]
 ]
 ....
 
-A `discriminatedType` type, in contrast, is an object who's content and 
structure is influenced by the input.
+A `discriminatedType` type, in contrast, is an object who`s content and 
structure is influenced by the input.
 
-Every discriminated type can contain an arbitrary number of `discriminator` 
fields and exactly one `typeSwitch` element.
+Every discriminated type can contain an arbitrary number of normal fields but 
must contain exactly one `typeSwitch` element.
 
 For example part of the spec for the S7 format looks like this:
 
@@ -51,47 +52,52 @@ For example part of the spec for the S7 format looks like 
this:
     [discriminator uint 8  messageType]
     [reserved      uint 16 '0x0000']
     [simple        uint 16 tpduReference]
-    [implicit      uint 16 parameterLength 'parameter.lengthInBytes']
-    [implicit      uint 16 payloadLength   'payload.lengthInBytes']
-    [typeSwitch 'messageType'
+    [implicit      uint 16 parameterLength 'parameter != null ? 
parameter.lengthInBytes : 0']
+    [implicit      uint 16 payloadLength   'payload != null ? 
payload.lengthInBytes : 0']
+    [typeSwitch messageType
         ['0x01' S7MessageRequest
         ]
-        ['0x03' S7MessageResponse
+        ['0x02' S7MessageResponse
             [simple uint 8 errorClass]
-            [simple uint 8 errorCode ]
+            [simple uint 8 errorCode]
+        ]
+        ['0x03' S7MessageResponseData
+            [simple uint 8 errorClass]
+            [simple uint 8 errorCode]
         ]
         ['0x07' S7MessageUserData
         ]
     ]
-    [simple S7Parameter('messageType')            parameter]
-    [simple S7Payload('messageType', 'parameter') payload  ]
+    [optional S7Parameter ('messageType')              parameter 
'parameterLength > 0']
+    [optional S7Payload   ('messageType', 'parameter') payload   
'payloadLength > 0'  ]
 ]
 ....
 
-A types start is declared by an opening square bracket `[` and ended with a 
closing one `]`.
-
-Also, to both provide a name as first argument.
+A type`s start is declared by an opening square bracket `[` followed by the 
`type` or `discriminatedType` keyword, which is directly followed by a name.
+A Type definition is ended with a closing square bracket `]`.
 
-Every type definition contains a list of fields that can have different types.
+Every type definition contains a list of so-called fields.
 
-The list of available types are:
+The list of available field types are:
 
-- abstract: used in the parent type declaration do declare a field that has to 
be defined with the identical type in all sub-types (reserved for 
`discriminatedType`).
+- abstract: used in the parent type declaration do declare a field that has to 
be defined with the identical type in all subtypes (reserved for 
`discriminatedType`).
 - array: array of simple or complex typed objects.
+- assert: generally similar to `constant` fields, however do they throw 
`AssertionExceptions` instead of hard `ParseExceptions`. They are used in 
combination with optional fields.
 - checksum: used for calculating and verifying checksum values.
 - const: expects a given value and causes a hard exception if the value 
doesn't match.
 - discriminator: special type of simple typed field which is used to determine 
the concrete type of object (reserved for `discriminatedType`).
 - enum: special form of field, used if an enum types property is to be used 
instead of it's primary value.
 - implicit: a field required for parsing, but is usually defined though other 
data, so it's not stored in the object, but calculated on serialization.
-- assert: generally similar to `constant` fields, however do they throw 
`AssertionExceptions` instead of hard `ParseExceptions`. They are used in 
combination with optional fields.
 - manualArray: like an array field, however the logic for serializing, 
parsing, number of elements and size have to be provided manually.
 - manual: simple field, where the logic for parsing, serializing and size have 
to be provided manually.
 - optional: simple or complex typed object, that is only present if an 
optional condition expression evaluates to `true` and no `AssertionException` 
is thrown when parsing the referenced type.
 - padding: field used to add padding data to make datastructures aligned.
+- peek: field that tries to parse a given structure without actually consuming 
the bytes.
 - reserved: expects a given value, but only warns if condition is not meet.
 - simple: simple or complex typed object.
-- typeSwitch: not a real field, but indicates the existence of sub-types, 
which are declared inline (reserved for `discriminatedType`).
+- typeSwitch: not a real field, but indicates the existence of subtypes, which 
are declared inline (reserved for `discriminatedType`).
 - unknown: field used to declare parts of a message that still has to be 
defined. Generally used when reverse-engineering a protocol. Messages with 
`unknown` fields can only be parsed and not serialized.
+- validation: this field is not actually a real field, it's more a condition 
that is checked during parsing and if the check fails, it throws a validation 
exception, wich is handled by
 - virtual: generates a field in the message, that is generally only used for 
simplification. It's not used for parsing or serializing.
 
 The full syntax and explanations of these type follow in the following 
chapters.
@@ -113,15 +119,20 @@ The base types available are currently:
 
 - *bit*: Simple boolean value or bit.
 - *byte*: Special value fixed to 8 bit, which defaults to either signed or 
unsigned depending on the programming language (Java it defaults to signed 
integer values and in C and Go it defaults to unsigned integers).
-- *uint*: The input is treated as unsigned integer value.
 - *int*: The input is treated as signed integer value.
+- *uint*: The input is treated as unsigned integer value.
 - *float*: The input is treated as floating point number.
 - *string*: The input is treated as string.
 
-All above types take a `size` value which provides how many `bits` should be 
read.
-All except the `bit` type, which is fixed to one single bit.
+Then for `dataIo` types we have some additional types:
+- *time*: The input is treated as a time representation
+- *date*: The input is treated as a date representation
+- *dateTime*: The input is treated as a date with time
+
+All except the `bit` and `byte` types take a `size` value which provides how 
many `bits` should be read.
+For the `bit` field, this obviously defaults to 1 and for the `byte` the bits 
default to 8.
 
-So reading an unsigned byte would be: `uint 8`.
+So reading an unsigned 8-bit integer would be: `uint 8`.
 
 There is currently one special type, reserved for string values, whose length 
is determined by an expression instead of a fixed number of bits. It is 
considered a variable length string:
 
@@ -129,7 +140,7 @@ There is currently one special type, reserved for string 
values, whose length is
 
 === Complex Types
 
-In contrast to simple types, complex type reference other complex types (Root 
elements of the spec document).
+In contrast to simple types, complex types reference other complex types (Root 
elements of the spec document).
 
 How the parser should interpret them is defined in the referenced types 
definition.
 
@@ -142,9 +153,11 @@ In the example above, for example the `S7Parameter` is 
defined in another part o
 An `array` field is exactly what you expect.
 It generates an field which is not a single-value element but an array or list 
of elements.
 
-    [array {simple-type} {size} '{name}' {'count', 'length', 'terminated'} 
'{expression}']
+    [array {bit|byte}           {name} {count|length|terminated} 
'{expression}']
 
-    [array {complex-type} '{name}' {'count', 'length', 'terminated'} 
'{expression}']
+    [array {simple-type} {size} {name} {count|length|terminated} 
'{expression}']
+
+    [array {complex-type}       {name} {count|length|terminated} 
'{expression}']
 
 Array types can be both simple and complex typed and have a name.
 An array field must specify the way it's length is determined as well as an 
expression defining it's length.
@@ -153,11 +166,32 @@ Possible values are:
 - `length`: In this case a given number of bytes are being read. So if an 
element has been parsed and there are still bytes left, another element is 
parsed.
 - `terminated`: In this case the parser will continue reading elements until 
it encounters a termination sequence.
 
+==== assert Field
+
+An assert field is pretty much identical to a `const` field.
+The main difference however it how the case is handled, if the parsed value 
does not match the expected value.
+
+     [assert         {bit|byte}            {name}          '{assert-value}']
+
+     [assert         {simple-type} {size}  {name}          '{assert-value}']
+
+While a `const` field would abort parsing in total with an error, an `assert` 
field with abort parsing, but the error will only bubble up in the stack till 
the first `optional` field is found.
+
+In this case the parser will be rewound to the position before starting to 
parse the `optional` field and continue parsing with the next field, skipping 
the `optional` field.
+
+If there is no upstream `optional` field, then parsing of the message 
terminates with an error.
+
+See also:
+- validation field: Similar to an `assert` field, however no parsing is done, 
and instead simply a condition is checked.
+- optional field: `optional` fields are aware of the types of parser errors 
produced by `assert` and `validation` fields
+
 ==== checksum Field
 
 A checksum field can only operate on simple types.
 
-    [checksum {simple-type} {size} '{name}' '{checksum-expression}']
+    [checksum {bit|byte}           {name} '{checksum-expression}']
+
+    [checksum {simple-type} {size} {name} '{checksum-expression}']
 
 When parsing a given simple type is parsed and then the result is compared to 
the value the `checksum-expression` provides.
 If they don't match an exception is thrown.
@@ -175,9 +209,11 @@ See also:
 
 A const field simply reads a given simple type and compares to a given 
reference value.
 
-    [const {simple-type} {size} '{name}' {reference}]
+    [const {bit|byte}           {name} {reference}]
 
-When parsing it makes the parser throw an Exception if the parsed value does 
not match.
+    [const {simple-type} {size} {name} {reference}]
+
+When parsing it makes the parser throws an Exception if the parsed value does 
not match the expected one.
 
 When serializing is simply outputs the expected constant.
 
@@ -190,7 +226,10 @@ See also:
 
 Discriminator fields are only used in `discriminatedType`s.
 
-    [discriminator {simple-type} {size} '{name}']
+    [discriminator {simple-type} {size} {name}]
+
+They are used, in cases where the value of a field determines the concrete 
type of a discriminated type.
+In this case we don't have to waste memory on storing the discriminator value 
and this can be statically assigned to the type.
 
 When parsing a discriminator fields result just in being a locally available 
variable.
 
@@ -204,7 +243,9 @@ See also:
 
 Implicit types are fields that get their value implicitly from the data they 
contain.
 
-    [implicit {simple-type} {size} '{name}' '{serialization-expression}']
+    [implicit {bit|byte}           {name} '{serialization-expression}']
+
+    [implicit {simple-type} {size} {name} '{serialization-expression}']
 
 When parsing an implicit type is available as a local variable and can be used 
by other expressions.
 
@@ -216,30 +257,41 @@ This field doesn't keep any data in memory.
 
 ==== manualArray Field
 
-    [manualArray {simple-type} {size} '{name}' {'count', 'length', 
'terminated'} '{loop-expression}' '{serialization-expression}' 
'{deserialization-expression}' '{length-expression}']
+    [manualArray {bit|byte}           {name} {count|length|terminated} 
'{loop-expression}' '{serialization-expression}' '{deserialization-expression}' 
'{length-expression}']
 
-    [manualArray {complex-type} '{name}' {'count', 'length', 'terminated'} 
'{loop-expression}' '{serialization-expression}' '{deserialization-expression}' 
'{length-expression}']
+    [manualArray {simple-type} {size} {name} {count|length|terminated} 
'{loop-expression}' '{serialization-expression}' '{deserialization-expression}' 
'{length-expression}']
+
+    [manualArray {complex-type}       {name} {count|length|terminated} 
'{loop-expression}' '{serialization-expression}' '{deserialization-expression}' 
'{length-expression}']
 
 ==== manual Field
 
-    [manual {simple-type} {size} '{name}' '{serialization-expression}' 
'{deserialization-expression}' '{length-expression}']
+    [manual {bit|byte}           {name} '{serialization-expression}' 
'{deserialization-expression}' '{length-expression}']
+
+    [manual {simple-type} {size} {name} '{serialization-expression}' 
'{deserialization-expression}' '{length-expression}']
 
-    [manual {complex-type} '{name}' '{serialization-expression}' 
'{deserialization-expression}' '{length-expression}']
+    [manual {complex-type}       {name} '{serialization-expression}' 
'{deserialization-expression}' '{length-expression}']
 
 ==== optional Field
 
 An optional field is a type of field that can also be `null`.
 
-    [optional {simple-type} {size} '{name}' '{optional-expression}']
+    [optional {bit|byte}           {name} ('{optional-expression}')?]
 
-    [optional {complex-type} '{name}' '{optional-expression}']
+    [optional {simple-type} {size} {name} ('{optional-expression}')?]
 
-When parsing the `optional-expression` is evaluated. If this results in`false` 
nothing is output, if it evaluates to `true` it is serialized as a `simple` 
field.
+    [optional {complex-type}       {name} ('{optional-expression}')?]
+
+The `optional-expression` attribute is optional. If it is provided the 
`optional-expression` is evaluated.
+If this results in`false` nothing is parsed, if it evaluates to `true` it is 
parsed.
+
+In any case, if when parsing the content of an `optional` field a `assert` or 
`validation` field fails, the parser is rewound to the position before starting 
to parse the `optional` field, the optional field is then skipped and the 
parser continues with the next field.
 
 When serializing, if the field is `null` nothing is output, if it is not 
`null` it is serialized normally.
 
 See also:
 - simple field: In general `optional` fields are identical to `simple` fields 
except the ability to be `null` or be skipped.
+- `assert`: Assert fields are similar to `const` fields, but can abort parsing 
of an `optional` filed.
+- `validation`: If a validation field in any of the subtypes fails, this 
aborts parsing of the `optional` field.
 
 ==== padding Field
 
@@ -247,25 +299,35 @@ A padding field allows aligning of data blocks.
 It outputs additional padding data, given amount of times specified by padding 
expression.
 Padding is added only when result of expression is bigger than zero.
 
-    [padding {simple-type} {size} '{pading-value}' '{padding-expression}']
+    [padding {bit|byte}            {name} '{pading-value}' '{times-padding}']
+
+    [padding {simple-type} {size}  {name} '{pading-value}' '{times-padding}']
 
-When parsing a `padding` field is just consumed without being made available 
as property or local variable if the `padding-expression` evaluates to value 
greater than zero.
-If it doesn't, it is just skipped.
+When parsing a `padding` field is being parsed, the `times-padding` 
expressions determines how often the `padding-value` should be read. So it 
doesn't really check if the read values match the `padding-value`, it just 
ensures the same amount of bits are being read. The read values are simply 
discarded.
+
+When serializing, the `times-padding` defines how often the `padding-value` 
should be written.
 
 This field doesn't keep any data in memory.
 
+===== peek Field
+
+// TODO: Implement
+
 ==== reserved Field
 
 Reserved fields are very similar to `const` fields, however they don't throw 
exceptions, but instead log messages if the values don't match.
 
-The reason for this is that in general reserved fields have the given value 
until they start to be used.
+The reason for this is that in general reserved fields have the given value 
until they start being used.
 
 If the field starts to be used this shouldn't break existing applications, but 
it should raise a flag as it might make sense to update the drivers.
 
-    [reserved {simple-type} {size} '{name}' '{reference}']
+    [reserved {bit|byte}           {name} '{reference}']
+
+    [reserved {simple-type} {size} {name} '{reference}']
+
+When parsing the values a `reserved` field is parsed and the result is 
compared to the reference value and then discarded.
 
-When parsing the values is parsed and the result is compared to the reference 
value.
-If the values don't match, a log message is sent.
+If the values don't match, a log message is written.
 
 This field doesn't keep any data in memory.
 
@@ -275,48 +337,42 @@ See also:
 ==== simple Field
 
 Simple fields are the most common types of fields.
-A `simple` field directly mapped to a normally typed field.
 
-    [simple {simple-type} {size} '{name}']
+A `simple` field directly mapped to a normally typed field of a message type.
 
-    [simple {complex-type} '{name}']
-
-When parsing, the given type is parsed (can't be `null`) and saved in the 
corresponding model instance's property field.
+    [simple {bit|byte}           {name}]
 
-When serializing it is serialized normally.
+    [simple {simple-type} {size} {name}]
 
-==== virtual Field
-
-Virtual fields have no impact on the input or output.
-They simply result in creating artificial get-methods in the generated model 
classes.
+    [simple {complex-type}       {name}]
 
-    [virtual {simple-type} {size} '{name}' '{value-expression}']
-
-    [virtual {complex-type} '{name}' '{value-expression}']
+When parsing, the given type is parsed (can't be `null`) and saved in the 
corresponding model instance's property field.
 
-Instead of being bound to a property, the return value of a `virtual` property 
is created by evaluating the `value-expression`.
+When serializing it is serialized normally using either a simple type 
serializer or by delegating serialization to a complex type.
 
 ==== typeSwitch Field
 
+// TODO: Finish this ...
+
 These types of fields can only occur in discriminated types.
 
 A `discriminatedType` must contain *exactly one* `typeSwitch` field, as it 
defines the sub-types.
 
-    [typeSwitch '{arument-1}', '{arument-2}', ...
-        ['{argument-1-value-1}' {subtype-1-name}
+    [typeSwitch {field-or-attribute-1}(,{field-or-attribute-2}, ...)
+        ['{field-1-value-1}' {subtype-1-name}
             ... Fields ...
         ]
-        ['{vargument-1-value-2}', '{argument-2-value-1}' {subtype-2-name}
+        ['{field-1-value-2}', '{field-2-value-1}' {subtype-2-name}
             ... Fields ...
         ]
-        ['{vargument-1-value-3}', '{argument-2-value-2}' {subtype-2-name} 
[uint 8 'existing-attribute-1', uint 16 'existing-attribute-2']
+        ['{field-1-value-3}', '{field-2-value-2}' {subtype-2-name} [uint 8 
'existing-attribute-1', uint 16 'existing-attribute-2']
             ... Fields ...
         ]
 
 A type switch element must contain a list of at least one argument expression.
 Only the last option can stay empty, which results in a default type.
 
-Each sub-type declares a comma-separated list of concrete values.
+Each subtype declares a comma-separated list of concrete values.
 
 It must contain at most as many elements as arguments are declared for the 
type switch.
 
@@ -326,18 +382,54 @@ If it matches and there are no more values, the type is 
found, if more values ar
 
 If no type is found, an exception is thrown.
 
-Inside each sub-type can declare fields using a subset of the types 
(`discriminator` and `typeSwitch` can't be used here)
+Inside each subtype can declare fields using a subset of the types 
(`discriminator` and `typeSwitch` can't be used here)
 
-The third case in above code-snippet also passes a named attribute to the 
sub-type.
+The third case in above code-snippet also passes a named attribute to the 
subtype.
 The name must be identical to any argument or named field parsed before the 
switchType.
 These arguments are then available for expressions or passing on in the 
subtypes.
 
+// TODO: Wildcard names
+
 See also:
 - `discriminatedType`
 
+===== unknown Field
+
+// TODO: Finish this ...
+
+This type of field is mainly used when working on reverse-engineering a new 
protocol.
+It allows parsing any type of information, storing and using it and 
serializing it back.
+
+In general, it's something similar to a `simple` field, just explicitly 
states, that we don't yet quite know how to handle the content.
+
+===== validation Field
+
+As mentioned before, a `validation` field is not really a field, it's a check 
that is added to the type parser.
+
+// TODO: Finish this ...
+
+If the expression provided in the `validation` field fails, the parser aborts 
parsing and goes up the stack, till it finds the first `optional` field.
+If it finds one, it rewinds the parser to the position just before starting to 
parse the `optional` field, then skips the `optional` fields and continues with 
the next field.
+
+If there is no `optional` field up the stack, then parsing fails.
+
+
+==== virtual Field
+
+Virtual fields have no impact on the input or output.
+They simply result in creating artificial get-methods in the generated model 
classes.
+
+    [virtual {bit|byte}           {name} '{value-expression}']
+
+    [virtual {simple-type} {size} {name} '{value-expression}']
+
+    [virtual {complex-type}       {name} '{value-expression}']
+
+Instead of being bound to a property, the return value of a `virtual` property 
is created by evaluating the `value-expression`.
+
 ==== Parameters
 
-Some times it is necessary to pass along additional parameters.
+Sometimes it is necessary to pass along additional parameters.
 
 If a complex type requires parameters, these are declared in the header of 
that type.
 
@@ -361,8 +453,33 @@ If a complex type requires parameters, these are declared 
in the header of that
 ]
 ....
 
-Therefore wherever a complex type is referenced an additional list of 
parameters can be passed to the next type.
+Therefore, wherever a complex type is referenced an additional list of 
parameters can be passed to the next type.
 
 Here comes an example of this in above snippet:
 
     [field S7Payload   'payload'   ['messageType', 'parameter']]
+
+==== Serializer and Parser-Arguments
+
+Arguments influence the way the parser or serializer operates.
+
+Wherever an parser-argument is used, this should also be valid in all subtypes 
the parser processes.
+
+===== byteOrder
+
+A `byteOrder` argument can set or change the byte-order used by the parser.
+
+We currently support two variants:
+
+- BIG_ENDIAN
+- LITTLE_ENDIAN
+
+===== encoding
+
+Each simple type has a default encoding, which is ok for a very high 
percentage of cases.
+
+Unsigned integers for example use 2s-complement notation, floating point 
values are encoded in IEEE 754 single- or double precision encoding. Strings 
are encoded as UTF-8 per default.
+
+However, in some cases an alternate encoding needs to be used. Especially when 
dealing with Strings, different encodings, such as ASCII, UTF-16 and many more, 
can be used. But also for numeric values, different encodings might be used. 
For example does KNX use a 16bit floating point encoding, which is not standard 
or in S7 drivers a special encoding was used to encode numeric values so they 
represent the number in hex format.
+
+An `encoding` attribute can be used to select a non-default encoding.
\ No newline at end of file

(plc4x) branch develop updated: docs: Updated some documentation on the code-generation.

Reply via email to