Re: [PR] Second Cut at new API for Layers - for review [daffodil]

via GitHub Wed, 28 Feb 2024 09:51:33 -0800


stevedlawrence commented on code in PR #1170:
URL: https://github.com/apache/daffodil/pull/1170#discussion_r1506236229



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/layers/LayerTransformerFactory.scala:
##########
@@ -17,15 +17,49 @@
 
 package org.apache.daffodil.runtime1.layers
 
+import org.apache.daffodil.runtime1.layers.api.Layer
+import org.apache.daffodil.runtime1.layers.api.LayerCompileInfo
+import org.apache.daffodil.runtime1.layers.api.LayerException
+
 /**
  * Factory for a layer transformer.
  *
  * This is the serialized object which is saved as part of a processor.
  * It constructs the layer transformer at runtime when newInstance() is called.
  *
- * This must be implmented as part of implementation of a daffodil layer.
+ * This allows the layer instance itself to be stateful.
  */
-abstract class LayerTransformerFactory(val name: String) extends Serializable {
+class LayerTransformerFactory(val layerCompileInfo: LayerCompileInfo) extends 
Serializable {
+
+  private def layerTransformName = layerCompileInfo.layerName
+
+  /**
+   * Call at runtime to create a layer transformer object. This can be stateful
+   * and non-thread safe.
+   *
+   * Called by the LayeredSequenceParser or LayeredSequenceUnparser to 
allocate the
+   * layer transformer, and the result is used to carry out the layering 
mechanism.
+   *
+   * @param lr the layer runtime info which includes both static and runtime
+   *           state-based information for the parse or unparse
+   * @return the LayerTransformer properly initialized/constructed for this 
layer
+   */
+  def newInstance(lr: LayerRuntimeImpl): LayerTransformer = {
+    val optLayerInstance = LayerRegistry.findLayer(layerTransformName)
+    val spiLayerInstance: Layer = optLayerInstance.getOrElse {
+      lr.runtimeSchemaDefinitionError(
+        new LayerException(
+          s"Unable to load class for layerTransform '$layerTransformName'.",
+        ),
+      )
+    }
+    // The LayerRegistry is not going to reload the class every time we ask 
for it.
+    // Rather, it is going to load it once. So we have to make new instances
+    // ourselves. But Layers are required to have a zero-arg constructor, so
+    // we can just call it.
+    val newInstance = 
spiLayerInstance.getClass().getConstructor().newInstance()
+
+    new LayerTransformer(layerCompileInfo, layer = newInstance)
+  }

Review Comment:
   A thought that might potentially simplify the internals--is it possible to 
get rid of the `LayerTransformerFactory` and `LayerTransformer` classes?
   
   What if the `LayerRegistry` stores a single instance of each `Layer` like it 
does current, but that `Layer` instance is shared among all 
`LayerSequenceParser/Unparsers` that use it? This does mean `Layer`'s must be 
thread-safe and cannot store state, but maybe that's okay? Instead, the `Layer` 
just becomes a factory for the `Input/OutputStreams` that the wrap functions 
call, an those `Input/OutputStreams` do not need to be thread safe and can 
store state?
   
   THat would remove the need for the `LayerTransformerFactory`, and I think 
much of the `LayerTransformer` logic could just be moved into the individual 
layer parser/unparsers? The actual logic in the `LayerTransformer` is pretty 
small and really just a couple functions each for parse/unparse.
   
   I don't think it hurts to have layer transformers, but I'm not sure they 
provide much extra except a layer of indirection. I guess the main benefit is 
Layers can store state, but that state probably makes more sense to be stored 
in the encoder/decoder streams?



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SequenceGroup.scala:
##########
@@ -60,21 +57,20 @@ abstract class SequenceTermBase(
 
   def separatorPosition: SeparatorPosition
 
-  def isLayered: Boolean
-
   def separatorParseEv: SeparatorParseEv
 
   def separatorUnparseEv: SeparatorUnparseEv
 
-  def layerLengthUnits: LayerLengthUnits
-
   def isOrdered: Boolean
 
   /**
    * Overridden in sequence group ref
    */
   def isHidden: Boolean = false
 
+  lazy val optionLayerTransform = findPropertyOption("layerTransform").toOption

Review Comment:
   Thoughts on changing the property name just `dfdlx:layer`? The limiting 
layers don't really transform anything so is potentially confusing?



##########
daffodil-runtime1-layers/src/main/scala/org/apache/daffodil/layers/runtime1/FixedLengthLayer.scala:
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.layers.runtime1
+
+import java.io.ByteArrayInputStream
+import java.io.ByteArrayOutputStream
+import java.io.InputStream
+import java.io.OutputStream
+import java.nio.ByteBuffer
+
+import org.apache.daffodil.lib.exceptions.Assert
+import org.apache.daffodil.runtime1.api.DFDLPrimType
+import org.apache.daffodil.runtime1.layers.api.Layer
+import org.apache.daffodil.runtime1.layers.api.LayerCompileInfo
+import org.apache.daffodil.runtime1.layers.api.LayerRuntime
+import org.apache.daffodil.runtime1.layers.api.LayerVariable
+
+import org.apache.commons.io.IOUtils
+
+/**
+ * Suitable only for checksums computed over small sections of data, not large 
data streams or whole files.
+ *
+ * The entire region of data the checksum is being computed over, will be 
pulled into a byte buffer in memory.

Review Comment:
   Is this just for ease of implementation? I don't think the checksum layer 
has access to this byte array right? Might be worth adding a TODO to fix that, 
seems this layer could be implemented in a streaming manner so it doesn't need 
to copy everything to another buffer.
   
   Should maybe also remove mentions of "checksum" since this could really be 
used for anything (that is relatively small).



##########
daffodil-propgen/src/main/resources/org/apache/daffodil/xsd/dfdlx.xsd:
##########
@@ -123,106 +123,10 @@
 
   <xs:attributeGroup name="ExtLayeringAGQualified">
     <xs:attribute form="qualified" name="layerTransform" 
type="dfdlx:LayerTransformType" />
-    <xs:attribute form="qualified" name="layerEncoding" 
type="dfdl:EncodingEnum_Or_DFDLExpression" />
-    <xs:attribute form="qualified" name="layerLengthKind" 
type="dfdlx:LayerLengthKindEnum" />
-    <xs:attribute form="qualified" name="layerLength" 
type="dfdl:DFDLNonNegativeInteger_Or_DFDLExpression" />
-    <xs:attribute form="qualified" name="layerLengthUnits" 
type="dfdlx:LayerLengthUnitsEnum" />
-    <xs:attribute form="qualified" name="layerBoundaryMark" 
type="dfdl:ListOfDFDLStringLiteral_Or_DFDLExpression" />
   </xs:attributeGroup>
 
-
   <xs:simpleType name="LayerTransformType">
-    <xs:union>
-      <xs:simpleType>
-        <xs:restriction base="dfdlx:LayerTransformEnum" />
-      </xs:simpleType>
-      <xs:simpleType>
-        <xs:restriction base="xs:NCName" /><!-- for dynamically loaded layers 
-->
-      </xs:simpleType>
-    </xs:union>
-  </xs:simpleType>
-
-  <xs:simpleType name="LayerTransformEnum">
-    <xs:restriction base="xs:token">
-      <xs:enumeration value="fourbyteswap">
-        <xs:annotation>
-          <xs:documentation>
-            Swap bytes in 32 bit words.
-          </xs:documentation>
-        </xs:annotation>
-      </xs:enumeration>
-      <xs:enumeration value="base64_MIME">
-        <xs:annotation>
-          <xs:documentation>
-            IETF RFC 2045
-
-            Max line length 76 characters.
-          </xs:documentation>
-        </xs:annotation>
-      </xs:enumeration>
-      <xs:enumeration value="lineFolded_IMF">
-        <xs:annotation>
-          <xs:documentation><![CDATA[
-             IETF RFC 2822 Internet Message Format (IMF)
-
-             Each line of characters MUST be no more than
-             998 characters, and SHOULD be no more than 78 characters, 
excluding
-             the CRLF.
-
-             Though structured field bodies are defined in such a way that
-             folding can take place between many of the lexical tokens (and 
even
-             within some of the lexical tokens), folding SHOULD be limited to
-             placing the CRLF at higher-level syntactic breaks.  For instance, 
if
-             a field body is defined as comma-separated values, it is 
recommended
-             that folding occur after the comma separating the structured 
items in
-             preference to other places where the field could be folded, even 
if
-             it is allowed elsewhere.
-
-             Unfolding is accomplished by simply removing any CRLF
-             that is immediately followed by WSP.
-             ]]>
-          </xs:documentation>
-        </xs:annotation>
-      </xs:enumeration>
-      <xs:enumeration value="lineFolded_iCalendar">
-        <xs:annotation>
-          <xs:documentation><![CDATA[
-             IETF RFC 5545 Internet Calendaring and Scheduling (iCalendar)
-
-             Lines of text SHOULD NOT be longer than 75 octets, excluding the 
line
-             break.  Long content lines SHOULD be split into a multiple line
-             representations using a line "folding" technique.  That is, a long
-             line can be split between any two characters by inserting a CRLF
-             immediately followed by a single linear white-space character 
(i.e.,
-             SPACE or HTAB).  Any sequence of CRLF followed immediately by a
-             single linear white-space character is ignored (i.e., removed) 
when
-             processing the content type.
-             ]]>
-          </xs:documentation>
-        </xs:annotation>
-      </xs:enumeration>
-      <xs:enumeration value="gzip">
-        <xs:annotation>
-          <xs:documentation>
-            GZIP per https://www.ietf.org/rfc/rfc1952.txt
-          </xs:documentation>
-        </xs:annotation>
-      </xs:enumeration>
-    </xs:restriction>
-  </xs:simpleType>
-
-  <xs:simpleType name="LayerLengthUnitsEnum">
-    <xs:restriction base="xs:string">
-      <xs:enumeration value="bytes" />
-    </xs:restriction>
-  </xs:simpleType>
-
-  <xs:simpleType name="LayerLengthKindEnum">
-    <xs:restriction base="xs:string">
-      <xs:enumeration value="explicit" />
-      <xs:enumeration value="boundaryMark" />
-      <xs:enumeration value="implicit" />
-    </xs:restriction>
+    <xs:restriction base="xs:NCName" />

Review Comment:
   Thinking that NCName does allows spaces, I wonder what if we did allow 
spaces like other DFDL properties, and it meant multiple layers. For example, 
instead of doing this:
   
   ```xml
   <sequence dfdlx:layer="fixedLength">
     <annotation><appinfo source="http://www.ogf.org/dfdl/";>
       <dfdl:newVariableInstance ref="fl:fixedLength" defaultValue="6"/>
     </appinfo></annotation>
     <xs:sequence dfdlx:layer="byteswap">
       <annotation><appinfo source="http://www.ogf.org/dfdl/";>
         <dfdl:newVariableInstance ref="bs:wordSize" defaultValue="2"/>
       </appinfo></annotation>
        ...
     </sequence>
   </sequence>
   ```
   
   you could do this:
   
   ```xml
   <sequence dfdlx:layers="fixedLength byteswap">
     <annotation><appinfo source="http://www.ogf.org/dfdl/";>
       <dfdl:newVariableInstance ref="fl:fixedLength" defaultValue="6"/>
       <dfdl:newVariableInstance ref="bs:wordSize" defaultValue="2"/>
     </appinfo></annotation>
      ...
   </sequence>
   ```
   
   It reduces verbosity in cases where multiple layers are needed. But maybe 
it's getting too fancy and adds extra complexity to the implementation?



##########
daffodil-propgen/src/main/resources/org/apache/daffodil/xsd/dfdlx.xsd:
##########
@@ -123,106 +123,10 @@
 
   <xs:attributeGroup name="ExtLayeringAGQualified">
     <xs:attribute form="qualified" name="layerTransform" 
type="dfdlx:LayerTransformType" />

Review Comment:
   Thought the same thing in another comment, agreed, `dfdlx:layer` probably 
makes more sense.



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/layers/api/LayerCompileInfo.java:
##########
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.runtime1.layers.api;
+
+import org.apache.daffodil.lib.exceptions.SchemaFileLocation;
+
+/**
+ * Provides contextual information for the layer's own static checking.
+ *
+ * This mostly is about getting access to DFDL variables. I
+ * It also allows reporting schema definition errors for any
+ * reason.
+ */
+public interface LayerCompileInfo {
+
+  String layerName();
+
+  SchemaFileLocation schemaFileLocation();
+
+  /**
+   * Obtains the LayerVariable object that can be used to access the variable 
at runtime.
+   *

Review Comment:
   Hmmm, one possible thought on how to remove these checks: what if a Layer 
defines an ordered list of input and output variables, and the parameters for 
the wrap function much match the types of those variables? We kindof do 
something similar with UDFs where we use reflection to check the type of UDFs. 
For example, a Layer might look something like this:
   
   ```java
   public final class MyLayer extends Layer {
   
     public MyLayer() {
       LavyerVariable[] inputVars = {
         new LayerVariable("urn:mylayer", "input1"),
         new LayerVariable("urn:mylayer", "input2"),
       };
       super("mylayer", inputVars);
     }
   
     public InputStream wrapLayerDecoder(InputStream jis, LayerRuntime lr, 
String input1, Int input2) {
       return new MyLayerInputStream(input1, input2);
     }
    
     public OutputStream wrapLayerEncoder(OutputStream jis, LayerRuntime lr, 
String input1, Int input2) {
       return new MyLayerOutputStream(input1, input2);
     }
   
   }
   ```
   
   So when Daffodil loads and registers layer,  it makes sure the schema 
defines the variables and uses reflection to make sure the types of those 
variables matches the additional arguments in the wrapLayerEncoder function. At 
runtime, the variables are evaluated by Daffodil and passed as those arguments 
in the order specified.
   
   The `LayerVariable` class could also have additional constructor parameters, 
like "optional", which might mean the variable isn't required to be defined in 
a schema and Daffodil will just pass `null` as the argument value to 
wrapLayerEncoder/Decoder. Or maybe there could be a "default" member, so if the 
variable doesn't exist or isn't set then that value is used.
   
   I'm not sure how output variables would work though? If we could get away 
with only allowing Layers to have only a single output variable, we could maybe 
do something like this:
   
   ```java
   public final class MyLayer extends Layer {
   
     public MyLayer() {
       LayerVariable output = new LayerVariable("urn:mylayer", "checksumVar");
       super(
         "mylayer",
         null, // no inputs
         output,
        );
     }
   
     static public int getResult(InputStream is) {
       MyLayerInputStream mis = (MyLayerInputStream)is;
       return mis.getChecksum();
     }
    
     ...
   }
   ```
   
   So at compile time we can get the output variable and look for the 
`getResult` function with  return value that matches the type of the variable. 
At runtime, the layer decoder InputStream will calculate the checksum and store 
it in some state (not in the `Layer`, but in the `InputStream` implementation). 
When the layer is done, we call the `getResult` function and pass in the 
inputstream that wrapLayerDecoder created so it can pull out that state. Or 
maybe we just require the wrapFunction to return a `LayerInputSream` or 
`LayerOutputStream`, which are just `Input/OutputStreams` with a `getResult` 
function that we call.
   
   I'm not sure how much I like that, and there's a lot of reflection going on, 
but it would allow Daffodil to handle checking of all variables, and from the 
layer API everything is just normal Java objects. Daffodil handles everything 
about getting and setting variables so Layer's never even have a View of the 
variables.
   
   This would also remove the need for the vars that store the 
`LayerVariable`'s set in the `check` function. It would be nice if those vars 
could be removed so `Layer`s could have no mutable state, and all the 
mutability happens in the `Input/OutputStreams`.



##########
daffodil-runtime1-layers/src/main/resources/org/apache/daffodil/layers/xsd/boundaryMarkLayer.dfdl.xsd:
##########
@@ -0,0 +1,38 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<schema
+  xmlns:xs="http://www.w3.org/2001/XMLSchema";
+  xmlns="http://www.w3.org/2001/XMLSchema";
+  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";
+  xmlns:bm="urn:org.apache.daffodil.layers.boundaryMark"
+  targetNamespace="urn:org.apache.daffodil.layers.boundaryMark">
+
+  <include 
schemaLocation="/org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />
+
+  <annotation>
+    <appinfo source="http://www.ogf.org/dfdl/";>
+
+      <dfdl:format ref="bm:GeneralFormat" />

Review Comment:
   Is a dfdl:format needed even though this is really just used to define 
variables?



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/layers/LayerCompiler.scala:
##########
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.core.layers
+
+import org.apache.daffodil.core.dsom.SequenceGroupTermBase
+import org.apache.daffodil.lib.exceptions.Assert
+import org.apache.daffodil.runtime1.layers.LayerRegistry
+import org.apache.daffodil.runtime1.layers.LayerTransformerFactory
+import org.apache.daffodil.runtime1.layers.api.Layer
+import org.apache.daffodil.runtime1.layers.api.LayerCompileInfo
+
+object LayerCompiler {
+
+  /**
+   * Compiles the layer doing all compile time checks, into a serializable 
runtime object
+   * which is a LayerTransformerFactory
+   *
+   * Digests the properties and definition of the layer and does compile-time 
checking
+   * for consistency and completeness of the definition.
+   *
+   * Chooses the right length-limiter class, and instantiates it.

Review Comment:
   Can this be removed? I think it's now up to schema authors to add length 
limiters if needed?



##########
daffodil-io/src/main/resources/META-INF/services/org.apache.daffodil.io.processors.charset.BitsCharsetDefinition:
##########
@@ -24,7 +24,6 @@ 
org.apache.daffodil.io.processors.charset.BitsCharset5BitPackedLSBFDefinition
 org.apache.daffodil.io.processors.charset.BitsCharset6BitDFI264DUI001Definition
 org.apache.daffodil.io.processors.charset.BitsCharset6BitDFI311DUI002Definition
 
org.apache.daffodil.io.processors.charset.BitsCharset6BitICAOAircraftIDDefinition
-org.apache.daffodil.io.processors.charset.BitsCharsetAISPayloadArmoringDefinition

Review Comment:
   Why was this charset removed?



##########
daffodil-lib/src/main/scala/org/apache/daffodil/lib/schema/annotation/props/ByHandMixins.scala:
##########
@@ -394,8 +394,8 @@ object BinaryBooleanTrueRepType {
 
       if (i < 0)
         element.schemaDefinitionError(
-          "For property 'binaryBooleanFalseRep', value must be an empty string 
or a non-negative integer. Found: %d",
-          i,
+          "For property 'binaryBooleanFalseRep', value must be an empty string 
or a non-negative integer. Found: %s",
+          i.toString,

Review Comment:
   Is this because of the schemaDefinitionError function changing from Any* to 
AnyRef*? Does it not auto-box the int?



##########
daffodil-runtime1-layers/src/main/scala/org/apache/daffodil/layers/runtime1/ByteSwapTransformer.scala:
##########
@@ -22,53 +22,27 @@ import java.io.OutputStream
 import java.util.ArrayDeque
 import java.util.Deque
 
-import org.apache.daffodil.io.ExplicitLengthLimitingStream
 import org.apache.daffodil.lib.exceptions.Assert
-import org.apache.daffodil.lib.schema.annotation.props.gen.LayerLengthKind
-import org.apache.daffodil.runtime1.layers._
-import org.apache.daffodil.runtime1.processors.ParseOrUnparseState
-
-final class FourByteSwapLayerCompiler extends LayerCompiler("fourbyteswap") {
-
-  override def compileLayer(layerCompileInfo: LayerCompileInfo): 
LayerTransformerFactory = {
-
-    layerCompileInfo.optLayerLengthKind match {
-      case Some(LayerLengthKind.Explicit) => // ok
-      case None =>
-        layerCompileInfo.SDE(
-          "The property dfdlx:layerLengthKind must be defined and have value 
'explicit'.",
-        )
-      case Some(other) =>
-        layerCompileInfo.SDE(
-          "The property dfdlx:layerLengthKind must be 'explicit' but was 
'$other'.",
-        )
-    }
+import org.apache.daffodil.runtime1.layers.api.Layer
+import org.apache.daffodil.runtime1.layers.api.LayerRuntime
 
-    layerCompileInfo.SDEUnless(
-      layerCompileInfo.optLayerLengthOptConstantValue.isDefined,
-      "The property dfdlx:layerLength must be defined.",
-    )
+final class TwoByteSwapLayer extends ByteSwap("twobyteswap", 2)
+final class FourByteSwapLayer extends ByteSwap("fourbyteswap", 4)

Review Comment:
   Thoughts on just having a single "ByteSwap" layer and the word size becomes 
a variable? 



##########
daffodil-runtime1-layers/src/main/scala/org/apache/daffodil/layers/runtime1/FixedLengthLayer.scala:
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.layers.runtime1
+
+import java.io.ByteArrayInputStream
+import java.io.ByteArrayOutputStream
+import java.io.InputStream
+import java.io.OutputStream
+import java.nio.ByteBuffer
+
+import org.apache.daffodil.lib.exceptions.Assert
+import org.apache.daffodil.runtime1.api.DFDLPrimType
+import org.apache.daffodil.runtime1.layers.api.Layer
+import org.apache.daffodil.runtime1.layers.api.LayerCompileInfo
+import org.apache.daffodil.runtime1.layers.api.LayerRuntime
+import org.apache.daffodil.runtime1.layers.api.LayerVariable
+
+import org.apache.commons.io.IOUtils
+
+/**
+ * Suitable only for checksums computed over small sections of data, not large 
data streams or whole files.
+ *
+ * The entire region of data the checksum is being computed over, will be 
pulled into a byte buffer in memory.
+ */
+final class FixedLengthLayer extends Layer("fixedLength") {
+
+  var layerLengthVar: LayerVariable = _
+
+  private def init(lci: LayerCompileInfo): Unit = {
+    if (layerLengthVar == null) {
+      layerLengthVar = 
lci.layerVariable("urn:org.apache.daffodil.layers.fixedLength", name())
+      if (layerLengthVar.dfdlPrimType != DFDLPrimType.UnsignedInt)
+        lci.schemaDefinitionError(
+          "Layer variable 'fixedLength' is not an 'xs:unsignedInt'.",
+        )
+    }
+  }
+
+  override def check(lci: LayerCompileInfo): Unit = init(lci)
+
+  override def wrapLayerDecoder(jis: InputStream, lr: LayerRuntime): 
InputStream = {
+    init(lr)
+    new FixedLengthInputStream(this, jis, lr)
+  }
+
+  override def wrapLayerEncoder(jos: OutputStream, lr: LayerRuntime): 
OutputStream = {
+    init(lr)
+    new FixedLengthOutputStream(this, jos, lr)
+  }
+}
+
+class FixedLengthInputStream(
+  layer: FixedLengthLayer,
+  jis: InputStream,
+  lr: LayerRuntime,
+) extends InputStream {
+
+  private val layerLength = {
+    val res = lr.getLong(layer.layerLengthVar)
+    Assert.invariant(res > 0)
+    res
+  }
+
+  private lazy val bais = {
+    val ba = new Array[Byte](layerLength.toInt)
+    val nRead = IOUtils.read(jis, ba)

Review Comment:
   This could read less than the layerLength it if hit EOF, do we need to 
handle that some way?



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/layers/LayerRuntimeImpl.scala:
##########
@@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.runtime1.layers
+
+import java.nio.charset.Charset
+
+import org.apache.daffodil.lib.exceptions.SchemaFileLocation
+import org.apache.daffodil.lib.util.Maybe
+import org.apache.daffodil.runtime1.infoset.DataValue
+import org.apache.daffodil.runtime1.layers.api.LayerRuntime
+import org.apache.daffodil.runtime1.layers.api.LayerVariable
+import org.apache.daffodil.runtime1.processors.ParseOrUnparseState
+import org.apache.daffodil.runtime1.processors.SequenceRuntimeData
+import org.apache.daffodil.runtime1.processors.VariableRuntimeData
+import org.apache.daffodil.runtime1.processors.parsers.PState
+import org.apache.daffodil.runtime1.processors.parsers.ParseError
+import org.apache.daffodil.runtime1.processors.unparsers.UState
+import org.apache.daffodil.runtime1.processors.unparsers.UnparseError
+
+class LayerRuntimeImpl(val state: ParseOrUnparseState, val srd: 
SequenceRuntimeData)
+  extends LayerRuntime {
+
+  override def processingError(cause: Throwable): Nothing =
+    processingError(Maybe(cause), Maybe.Nope)
+
+  override def processingError(msg: String): Nothing =
+    processingError(Maybe.Nope, Maybe(msg))
+
+  private def processingError(mCause: Maybe[Throwable], mMsg: Maybe[String]): 
Nothing = {
+    val diagnostic = state match {
+      case ps: PState =>
+        new ParseError(
+          rd = Maybe(srd.schemaFileLocation),
+          loc = Maybe(state.currentLocation),
+          causedBy = mCause,
+          kind = mMsg,
+        )
+      case us: UState =>
+        new UnparseError(
+          rd = Maybe(srd.schemaFileLocation),
+          loc = Maybe(state.currentLocation),
+          causedBy = mCause,
+          kind = mMsg,
+        )
+    }
+    throw diagnostic
+  }
+
+  override def runtimeSchemaDefinitionError(msg: String, args: AnyRef*): 
Nothing =
+    state.SDE(msg, args: _*)
+
+  override def runtimeSchemaDefinitionError(cause: Throwable): Nothing =
+    state.SDE(cause)
+
+  override def getString(variable: LayerVariable): String =
+    state
+      .getVariable(variable.asInstanceOf[VariableRuntimeData], srd)
+      .getString
+
+  override def setString(variable: LayerVariable, s: String): Unit =
+    state.setVariable(
+      variable.asInstanceOf[VariableRuntimeData],
+      DataValue.toDataValue(s),
+      srd,
+    )
+
+  override def getInt(variable: LayerVariable): Int =
+    state
+      .getVariable(variable.asInstanceOf[VariableRuntimeData], srd)
+      .getInt
+
+  override def setInt(variable: LayerVariable, v: Int): Unit =
+    state.setVariable(
+      variable.asInstanceOf[VariableRuntimeData],
+      DataValue.toDataValue(v),
+      srd,
+    )
+
+  override def schemaDefinitionError(msg: String, args: AnyRef*): Unit =
+    runtimeSchemaDefinitionError(msg, args)
+
+  override def getCharset(layerEncoding: String): Charset = 
Charset.forName(layerEncoding)

Review Comment:
   Should we remove this from LayerRuntime? Layers that need a charset can call 
`Charset.forName` themeselves?



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/runtime1/SequenceTermRuntime1Mixin.scala:
##########
@@ -84,6 +85,7 @@ trait ChoiceBranchImpliedSequenceRuntime1Mixin { self: 
ChoiceBranchImpliedSequen
       Maybe.Nope,
       Maybe.Nope,
       isHidden = false,
+      Maybe.toMaybe(optionLayerTransform),

Review Comment:
   Can this be `Myebe.Nope`? I imagine an implied sequence on a choice branch 
will never have a layer?



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/layers/LayerTransformerFactory.scala:
##########
@@ -17,15 +17,49 @@
 
 package org.apache.daffodil.runtime1.layers
 
+import org.apache.daffodil.runtime1.layers.api.Layer
+import org.apache.daffodil.runtime1.layers.api.LayerCompileInfo
+import org.apache.daffodil.runtime1.layers.api.LayerException
+
 /**
  * Factory for a layer transformer.
  *
  * This is the serialized object which is saved as part of a processor.
  * It constructs the layer transformer at runtime when newInstance() is called.
  *
- * This must be implmented as part of implementation of a daffodil layer.
+ * This allows the layer instance itself to be stateful.
  */
-abstract class LayerTransformerFactory(val name: String) extends Serializable {
+class LayerTransformerFactory(val layerCompileInfo: LayerCompileInfo) extends 
Serializable {

Review Comment:
   Wondered if we could get rid of this in another comment. If we don't, I'd 
suggest we rename these transformer classes. I'm not sure they really do any 
transforming anymore, they really just a wrappers to allow Layers to store 
state in the `Layer` instance. Maybe `LayerWrapper` or something?



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/layers/api/LayerRuntime.java:
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.runtime1.layers.api;
+
+import java.nio.charset.Charset;
+
+/**
+ * Runtime information and stateful services available to the layer when
+ * encoding/decoding the layer data to/from the input/output stream.
+ *
+ * Provides the ability to cause runtime processing errors, which can cause 
backtracking
+ * when parsing, but are fatal when unparsing.
+ *
+ * Also provides the ability to cause runtime schema definition errors, which 
are always
+ * fatal.
+ *
+ * This object contains the processor state, but hidden behind an API so that 
only relevant
+ * aspects of the processor state are visible.
+ */
+public interface LayerRuntime extends LayerCompileInfo {
+
+  Charset getCharset(String layerEncoding);
+
+  void processingError(String msg);
+
+  void processingError(Throwable cause);
+
+  void runtimeSchemaDefinitionError(String msg, Object... args);
+
+  void runtimeSchemaDefinitionError(Throwable cause);
+
+  String getString(LayerVariable variable);
+
+  void setString(LayerVariable variable, String s);
+
+  int getInt(LayerVariable variable);
+
+  void setInt(LayerVariable variable, int v);
+
+  long getLong(LayerVariable variable);
+
+  void setLong(LayerVariable variable, long v);
+}

Review Comment:
   A conseequence of the above comment if variables are just turned into 
function arguments/return value, much of this goes away. We wouldn't need the 
LayerVariable stuff at all (and could support more than just Int/String/Long, 
we could support anything a UDF supports.
   
   I also wonder if we could get rid of the processing error an sde functions? 
Could layers just be expected to throw `LayerProcessorErrorException` or 
`LayerSchemaDefinitionErrorException`, and then the sequence layer 
parser/unparse could catch those? I guess maybe you lose context about where 
the error occurs? Since the exception would be caught by some random 
parser/unparse, and not the layer parser/unparse. I was thinking maybe the 
LayerRuntime could be completely removed, but I think not since you do need the 
layer context, I think. 



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/layers/api/Layer.java:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.runtime1.layers.api;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+
+/**
+ * This is the primary API class for writing layers.
+ * <p/>
+ * All layers are derived from this class, and must have no-args default 
constructors.
+ * <p/>
+ * Derived classes will be dynamically loaded by Java's SPI system.
+ * The names of concrete classes derived from Layer are listed in a 
resources/M.services file
+ * so that they can be found and dynamically loaded.
+ * <p/>
+ * The SPI creates an instance the class of which is used as a factory to 
create the
+ * instances actually used by the Daffodil runtime. Compilation of the static 
information about
+ * the layer occurs only once and is then shared by all runtime instances.
+ * <p/>
+ * Instances of derived layer classes can be stateful. They are private to 
threads, and each time a layer
+ * is encountered during parse/unparse, an instance is created for that 
situation.
+ * <p/>
+ * Layer instances should not share mutable state (such as via singleton 
objects)
+ * <p/>
+ * All the static information about the layer is provided in the arguments.
+ * <p/>
+ * The rest of the Layer class implements the
+ * layer decode/encode logic, which is done as part of deriving one's Layer 
class from the
+ * Layer base class.
+ * <p/>
+ * About variables: Layer logic may read and write variables. Variables being 
read are parameters to
+ * the layer algorithm. Variables being written are outputs (such as 
checksums) from the layer algorithm.
+ * Variables being written must be undefined, since variables in DFDL are 
single-assignment.
+ * Variables being read must be defined before being read by the layer, and 
this is true for both
+ * parsing and unparsing. When unparsing, variables being read cannot be 
forward-referencing to parts
+ * of the DFDL infoset that have not yet been unparsed.
+ * <p/>
+ */
+public abstract class Layer {
+
+  protected final String layerName;
+
+  public Layer(String layerName) {
+    this.layerName = layerName;
+  }
+
+  public final String name() {
+    return this.layerName; // name() method with empty args is required by SPI 
loader
+  }
+
+  /**
+   * Called exactly once when the schema is compiled to do extra checking that 
the layer is being used properly.
+   * The thrown exception becomes a SchemaDefinitionError at schema compile 
time.
+   *
+   * Example checks are:
+   * - layerEncoding is constant and is a single-byte charset
+   * - layerLength, if constant, is within a maximum value range
+   * - layerBoundaryMark string, if constant, is not too long and contains 
only allowed characters.
+   * These things can be required to be constant by this check, or it can 
check their values for legality
+   * if they happen to be constant. Since these are runtime-valued properties 
(can be expressions), then if the
+   * layer allowed that, they must also be checked at runtime.
+   *
+   * You don't have to check that the variables are defined and declared in 
matching manner, that happens automatically.
+   *
+   * @throws LayerException on any error.
+   *
+   */
+  public void check(LayerCompileInfo layerPropertyInfo) throws LayerException 
{ /* nothing */ }
+
+
+  public InputStream wrapLayerDecoder(InputStream jis, LayerRuntime lr) throws 
IOException { return null; }
+
+  public OutputStream wrapLayerEncoder(OutputStream jos, LayerRuntime lr) 
throws IOException { return null; }

Review Comment:
   Thoughts on naming these both `wrap` with either `InputStream` or 
`OutputStream` parameter types? Limiting layers don't really encode/decode so 
it's maybe confusing?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Second Cut at new API for Layers - for review [daffodil]

Reply via email to