[GitHub] [daffodil] mbeckerle commented on a change in pull request #643: layering extension to support CRC, checksum, parity

GitBox Mon, 27 Sep 2021 07:45:17 -0700


mbeckerle commented on a change in pull request #643:
URL: https://github.com/apache/daffodil/pull/643#discussion_r716732801




##########
File path: 
daffodil-runtime1/src/main/scala/org/apache/daffodil/layers/LayerTransformer.scala
##########
@@ -200,6 +214,54 @@ abstract class LayerTransformer()
   //    println("DIS hex dump = " + str)
   //    ???
   //  }
+
+  /**
+   * Enables the layer transformer to take necessary state-maniuplating 
actions before
+   * the parsing of a layer starts.

Review comment:
       A facade makes sense. As I write more of these layers however, I've 
concluded a much higher-level pattern can be imposed also which may make 
exposing the P/Ustate objects unnecessary. E.g., we need the QName of a 
variable to assign, and the algorithm that takes a byte array and returns an 
int.Those really are the only needed parameters for checksums/CRC/parity. 

##########
File path: 
daffodil-runtime1/src/main/scala/org/apache/daffodil/layers/LayerTransformer.scala
##########
@@ -101,15 +102,28 @@ object LayerTransformerFactory {
   register(GZIPTransformerFactory)
   register(IMFLineFoldedTransformerFactory)
   register(ICalendarLineFoldedTransformerFactory)
+
+  /**
+   * Arguably the above could be bundled with Daffodil.
+   *
+   * The transformers below really should be plugins defined
+   * outside Daffodil, per DAFFODIL-1927.
+   */
   register(AISPayloadArmoringTransformerFactory)
   register(FourByteSwapTransformerFactory)
+  register(IPv4Checksum.Factories.Part1)
+  register(IPv4Checksum.Factories.Part2)
+
 }
 
 /**
  * Shared functionality of all LayerTransformers.
+ *
+ * A layer transformer is serialized as part of the processor definition;
+ * hence, they must be stateless as we allow the processor definition to be
+ * shared across multiple threads.

Review comment:
       Will fix. This comment reflects a misunderstanding I had. You are 
correct. The factory is serialized, the transformer itself is created at 
runtime. 

##########
File path: 
daffodil-test/src/test/scala/org/apache/daffodil/layers/CheckDigit.scala
##########
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.layers
+
+import org.apache.commons.io.IOUtils
+import org.apache.daffodil.schema.annotation.props.gen.LayerLengthKind
+import org.apache.daffodil.schema.annotation.props.gen.LayerLengthUnits
+import org.apache.daffodil.util.Maybe
+import org.apache.daffodil.processors.LayerLengthEv
+import org.apache.daffodil.processors.LayerBoundaryMarkEv
+import org.apache.daffodil.processors.LayerCharsetEv
+import org.apache.daffodil.processors.parsers.PState
+import org.apache.daffodil.processors.unparsers.UState
+
+import java.io._
+import org.apache.daffodil.dpath.NodeInfo.PrimType
+import org.apache.daffodil.exceptions.Assert
+import org.apache.daffodil.io.DataInputStream
+import org.apache.daffodil.processors.CharsetEvBase
+import org.apache.daffodil.processors.SequenceRuntimeData
+import org.apache.daffodil.processors.SuspendableOperation
+import org.apache.daffodil.processors.charset.BitsCharsetJava
+import org.apache.daffodil.util.ByteBufferOutputStream
+import org.apache.daffodil.xml.NS
+import org.apache.daffodil.xml.RefQName
+
+import java.nio.ByteBuffer
+import java.nio.charset.Charset
+
+
+final class CheckDigitExplicit(
+  srd: SequenceRuntimeData,
+  charset: Charset,
+  layerLengthEv: LayerLengthEv)
+extends LayerTransformer() {
+  Assert.invariant(charset.newDecoder().maxCharsPerByte() == 1) // is a SBCS 
charset
+
+  private val checkDigitVRD = {
+    val varNamespace = NS("urn:org.apache.daffodil.layers.checkDigit")
+    val varQName = RefQName(Some("cd"), "checkDigit", 
varNamespace).toGlobalQName
+    val vrd = srd.variableMap.getVariableRuntimeData(varQName).getOrElse {
+      srd.SDE("Variable '%s' is not defined.", varQName.toExtendedSyntax)
+    }
+    srd.schemaDefinitionUnless(vrd.primType == PrimType.UnsignedInt,
+      "Variable '%s' is not of type 'xs:unsignedInt'.", 
varQName.toExtendedSyntax)
+    vrd
+  }
+
+  private var limitingOutputStream : ByteBufferOutputStream = _
+  private var originalOutputStream: OutputStream = _
+
+  protected def wrapLayerDecoder(jis: InputStream) = jis
+
+  override def wrapLimitingStream(jis: java.io.InputStream, state: PState) = {
+    val layerLengthInBytes: Int = layerLengthEv.evaluate(state).toInt
+    val ba = new Array[Byte](layerLengthInBytes)
+    try
+      IOUtils.readFully(jis, ba)
+    catch {
+      case eof: EOFException =>
+        throw DataInputStream.NotEnoughDataException(ba.length * 8)
+    }
+    // this stream will be used by the parse when it parses the
+    // schema contents of the layer. We have already consumed the bytes
+    // from the original input stream, and saved those in the byte array
+    // to enable the computation of the checkDigit.
+    val layerStream = new ByteArrayInputStream(ba)
+    val str = new String(ba, charset) // always does replacement on decode 
errors.
+    val checkDigit = computeCheckDigit(str)
+    state.setVariable(checkDigitVRD, checkDigit, srd) // assign to result 
variable
+    layerStream

Review comment:
       If the checksum was over a large thing, like a whole file, then one 
would want to use a layered stream that does the checksumming as part of the 
parser reading the data stream. That would avoid the need to buffer stuff. 
That's what for example the gzip and line-folding layers do. 
   
   In this case this is IP packets, by definition small. So just grabbing it 
into a buffer seemed simplest. 

##########
File path: 
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/unparsers/UState.scala
##########
@@ -74,6 +76,21 @@ abstract class UState(
   extends ParseOrUnparseState(vbox, diagnosticsArg, dataProcArg, tunable)
   with Cursor[InfosetAccessor] with ThrowsSDE with SavesErrorsAndWarnings {
 
+  final def setVariable(vrd: VariableRuntimeData, newValue: 
DataValuePrimitive, referringContext: ThrowsSDE) =
+    vbox.vmap.setVariable(vrd, newValue, referringContext, this)
+
+  /**
+   * For unparsing, this throws a RetryableException in the case where the 
variable cannot (yet) be read.
+   *
+   * @param vrd Identifies the variable to read.
+   * @param referringContext Where to place blame if there is an error.
+   * @return The data value of the variable, or throws exceptions if there is 
no value.
+   */
+  final def getVariable(vrd: VariableRuntimeData, referringContext: 
ThrowsSDE): DataValuePrimitive =

Review comment:
       I will actually look at how expressions that read variables do it. They 
are obviously not calling this getter, but perhaps could which would solve the 
test coverage. 

##########
File path: 
daffodil-runtime1/src/main/scala/org/apache/daffodil/layers/LayerTransformer.scala
##########
@@ -200,6 +214,54 @@ abstract class LayerTransformer()
   //    println("DIS hex dump = " + str)
   //    ???
   //  }
+
+  /**
+   * Enables the layer transformer to take necessary state-maniuplating 
actions before
+   * the parsing of a layer starts.
+   *
+   * Examples might be setting variable values.
+   *
+   * At the time this is called, the state contains the layered I/O stream 
already.
+   * @param s The parser state, which may be modified.
+   */
+  def startLayerForParse(s: PState):Unit
+
+  /**
+   * Enables the layer transformer to take necessary state-maniuplating 
actions after
+   * the parsing of a layer ends.
+   *
+   * Examples might be setting variable values.
+   *
+   * At the time this is called, the state still contains the layered I/O 
stream.
+   * @param s The parser state, which may be modified.
+   */
+  def endLayerForParse(s: PState): Unit
+
+  /**
+   * Enables the layer transformer to take necessary state-maniuplating 
actions before
+   * the unparsing of a layer starts.
+   *
+   * Examples might be setting variable values.
+   *
+   * Note that if variables without values are read by this method, then this 
must create
+   * suspended calculations that clone the necessary state, which are 
evaluated later.
+   *
+   * At the time this is called, the state still contains the layered I/O 
stream.
+   * @param s The unparser state, which may be modified.
+   */
+  def startLayerForUnparse(s: UState): Unit
+
+  /**
+   * Enables the layer transformer to take necessary state-maniuplating 
actions after
+   * the unparsing of a layer ends.
+   *
+   * Note that if variables without values are read by this method, then this 
must create
+   * suspended calculations that clone the necessary state, which are 
evaluated later.

Review comment:
       This is a very good point. I think the plugin API really needs to 
abstract away this stuff so that a framework that creates suspensions is being 
used, and drives the plug in methods. 

##########
File path: 
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/PState.scala
##########
@@ -327,11 +328,17 @@ final class PState private (
     this.infoset = newParent
   }
 
-  def setVariable(vrd: VariableRuntimeData, newValue: DataValuePrimitive, 
referringContext: VariableRuntimeData): Unit = {
+  def setVariable(vrd: VariableRuntimeData, newValue: DataValuePrimitive, 
referringContext: ThrowsSDE): Unit = {
     variableMap.setVariable(vrd, newValue, referringContext, this)
     changedVariablesStack.top += vrd.globalQName
   }
 
+  def getVariable(vrd: VariableRuntimeData, referringContext: ThrowsSDE): 
DataValuePrimitive = {
+    val res = variableMap.readVariable(vrd, referringContext, this)
+    changedVariablesStack.top += vrd.globalQName

Review comment:
       This was needed in the first version of IPv4 which had two transforms 
where the first one assigned a variable that had to be read (gotten) by the 
second. 
   
   When I simplified IPv4 to a single transform, this was no longer needed. And 
check-digits doesn't need it either.
   
   Neither one reads any variable. Just writes them. 
   
   But there's nothing stopping a layer transform from wanting to read a 
variable, e.g, to get a parameter to the algorithm, e.g, a mode flag like 
compression algorithm to use, or variant of line-folding to use, etc. 
   
   So I think we just need an example that exercises this call. 

##########
File path: 
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/EvLayering.scala
##########
@@ -70,16 +70,27 @@ final class LayerTransformerEv(
   layerTransformEv: LayerTransformEv,
   maybeLayerCharsetEv: Maybe[LayerCharsetEv],
   maybeLayerLengthKind: Maybe[LayerLengthKind],
-  maybeLayerLengthInBytesEv: Maybe[LayerLengthInBytesEv],
+  maybeLayerLengthEv: Maybe[LayerLengthEv],
   maybeLayerLengthUnits: Maybe[LayerLengthUnits],
   maybeLayerBoundaryMarkEv: Maybe[LayerBoundaryMarkEv],
-  ci: DPathCompileInfo)
-  extends Evaluatable[LayerTransformer](ci)
+  srd: SequenceRuntimeData)
+  extends Evaluatable[LayerTransformer](srd.dpathCompileInfo)
   with NoCacheEvaluatable[LayerTransformer] {
 
+  /**
+   * A layer transformer is never a constant. It always must be evaluated to 
get
+   * a new one.
+   *
+   * This prevents it being cached/stored as part of the state of the data 
processor
+   * where it would be shared across threads.
+   *
+   * Otherwise layer transformers couldn't be stateful. But they often need to 
be stateful.
+   */
+  override protected def isNeverConstant = true

Review comment:
       Good point. I'll arrange for NoCacheExecutable to have that behavior, 
update its scaladoc accordingly, and remove this flag. 

##########
File path: 
daffodil-propgen/src/main/resources/org/apache/daffodil/xsd/DFDL_part2_attributes.xsd
##########
@@ -1069,18 +1037,5 @@
     <xsd:attributeGroup ref="dfdlx:SimpleTypeValueCalcAGQualified" />
     <xsd:attributeGroup ref="dfdlx:ObjectKindAGQualified" />
   </xsd:attributeGroup>
-
-  <xsd:attributeGroup name="LayeringAGQualified">
-    <xsd:attribute form="qualified" name="layerTransform"
-      type="dfdl:LayerTransformType_Or_DFDLExpression" />
-    <xsd:attribute form="qualified" name="layerEncoding"
-      type="dfdl:EncodingEnum_Or_DFDLExpression" />
-    <xsd:attribute form="qualified" name="layerLengthKind" 
type="dfdl:LayerLengthKindEnum" />
-    <xsd:attribute form="qualified" name="layerLength"
-      type="dfdl:DFDLNonNegativeInteger_Or_DFDLExpression" />
-    <xsd:attribute form="qualified" name="layerLengthUnits" 
type="dfdl:LayerLengthUnitsEnum" />
-    <xsd:attribute form="qualified" name="layerBoundaryMark"
-      type="dfdl:ListOfDFDLStringLiteral_Or_DFDLExpression" />
-  </xsd:attributeGroup>

Review comment:
       Probably better to just make a specific JIRA ticket for this kind of a 
change. 
   
   Created https://issues.apache.org/jira/browse/DAFFODIL-2564

##########
File path: 
daffodil-runtime1/src/main/scala/org/apache/daffodil/layers/LayerTransformer.scala
##########
@@ -173,37 +187,33 @@ abstract class LayerTransformer()
     //
   }
 
-  // These were very useful for debugging. Note that they stop with a ???
-  // error. That's because dumping streams changes their state.
-  //
-  // Keeping these around commented out, for now. While this feature is still
-  // new and may need further debugging.
-  //
-  //  def dumpLayer(is: InputStream) {
-  //    val str = Stream.continually(is.read).takeWhile(_ != 
-1).map(_.toChar).mkString
-  //    println("dump length " + str.length + " = " + str)
-  //    ???
-  //  }
-  //
-  //  def hexDumpLayer(is: InputStream) {
-  //    val str = hexify(is)
-  //    println("hex dump = " + str)
-  //    ???
-  //  }
-  //
-  //  def hexify(is: InputStream): String =
-  //    Stream.continually(is.read).takeWhile(_ != -1).map(x => 
"%02x".format(x.toInt)).mkString(" ")
-  //
-  //  def hexDumpLayer(is: InputSourceDataInputStream, finfo: FormatInfo) {
-  //    val jis = new JavaIOInputStream(is, finfo)
-  //    val str = hexify(jis)
-  //    println("DIS hex dump = " + str)
-  //    ???
-  //  }
+  /**
+   * Enables the layer transformer to take necessary state-maniuplating 
actions before
+   * the parsing of a layer starts.
+   *
+   * Examples might be setting variable values.
+   *
+   * At the time this is called, the state contains the layered I/O stream 
already.
+   * @param s The parser state, which may be modified.
+   */
+  def startLayerForParse(s: PState):Unit = () // nothing by default

Review comment:
       
   I think it just works that you only need a start function for the parse, to 
capture the layer data before regular parsing parses it. You don't have to 
compute the checksum at that point, but I see no reason not to. 
   
   For unparsing, it's the opposite. You need to first unparse and capture the 
outbound data, and only after that can you recompute the checksum. 
   
   I originally created start/end calls for each of parse/unparse, then 
discovered that I didn't need end for parse, nor start for unparse. 
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [daffodil] mbeckerle commented on a change in pull request #643: layering extension to support CRC, checksum, parity

Reply via email to