mbeckerle commented on a change in pull request #643:
URL: https://github.com/apache/daffodil/pull/643#discussion_r716732801
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/layers/LayerTransformer.scala
##########
@@ -200,6 +214,54 @@ abstract class LayerTransformer()
// println("DIS hex dump = " + str)
// ???
// }
+
+ /**
+ * Enables the layer transformer to take necessary state-maniuplating
actions before
+ * the parsing of a layer starts.
Review comment:
A facade makes sense. As I write more of these layers however, I've
concluded a much higher-level pattern can be imposed also which may make
exposing the P/Ustate objects unnecessary. E.g., we need the QName of a
variable to assign, and the algorithm that takes a byte array and returns an
int.Those really are the only needed parameters for checksums/CRC/parity.
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/layers/LayerTransformer.scala
##########
@@ -101,15 +102,28 @@ object LayerTransformerFactory {
register(GZIPTransformerFactory)
register(IMFLineFoldedTransformerFactory)
register(ICalendarLineFoldedTransformerFactory)
+
+ /**
+ * Arguably the above could be bundled with Daffodil.
+ *
+ * The transformers below really should be plugins defined
+ * outside Daffodil, per DAFFODIL-1927.
+ */
register(AISPayloadArmoringTransformerFactory)
register(FourByteSwapTransformerFactory)
+ register(IPv4Checksum.Factories.Part1)
+ register(IPv4Checksum.Factories.Part2)
+
}
/**
* Shared functionality of all LayerTransformers.
+ *
+ * A layer transformer is serialized as part of the processor definition;
+ * hence, they must be stateless as we allow the processor definition to be
+ * shared across multiple threads.
Review comment:
Will fix. This comment reflects a misunderstanding I had. You are
correct. The factory is serialized, the transformer itself is created at
runtime.
##########
File path:
daffodil-test/src/test/scala/org/apache/daffodil/layers/CheckDigit.scala
##########
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.layers
+
+import org.apache.commons.io.IOUtils
+import org.apache.daffodil.schema.annotation.props.gen.LayerLengthKind
+import org.apache.daffodil.schema.annotation.props.gen.LayerLengthUnits
+import org.apache.daffodil.util.Maybe
+import org.apache.daffodil.processors.LayerLengthEv
+import org.apache.daffodil.processors.LayerBoundaryMarkEv
+import org.apache.daffodil.processors.LayerCharsetEv
+import org.apache.daffodil.processors.parsers.PState
+import org.apache.daffodil.processors.unparsers.UState
+
+import java.io._
+import org.apache.daffodil.dpath.NodeInfo.PrimType
+import org.apache.daffodil.exceptions.Assert
+import org.apache.daffodil.io.DataInputStream
+import org.apache.daffodil.processors.CharsetEvBase
+import org.apache.daffodil.processors.SequenceRuntimeData
+import org.apache.daffodil.processors.SuspendableOperation
+import org.apache.daffodil.processors.charset.BitsCharsetJava
+import org.apache.daffodil.util.ByteBufferOutputStream
+import org.apache.daffodil.xml.NS
+import org.apache.daffodil.xml.RefQName
+
+import java.nio.ByteBuffer
+import java.nio.charset.Charset
+
+
+final class CheckDigitExplicit(
+ srd: SequenceRuntimeData,
+ charset: Charset,
+ layerLengthEv: LayerLengthEv)
+extends LayerTransformer() {
+ Assert.invariant(charset.newDecoder().maxCharsPerByte() == 1) // is a SBCS
charset
+
+ private val checkDigitVRD = {
+ val varNamespace = NS("urn:org.apache.daffodil.layers.checkDigit")
+ val varQName = RefQName(Some("cd"), "checkDigit",
varNamespace).toGlobalQName
+ val vrd = srd.variableMap.getVariableRuntimeData(varQName).getOrElse {
+ srd.SDE("Variable '%s' is not defined.", varQName.toExtendedSyntax)
+ }
+ srd.schemaDefinitionUnless(vrd.primType == PrimType.UnsignedInt,
+ "Variable '%s' is not of type 'xs:unsignedInt'.",
varQName.toExtendedSyntax)
+ vrd
+ }
+
+ private var limitingOutputStream : ByteBufferOutputStream = _
+ private var originalOutputStream: OutputStream = _
+
+ protected def wrapLayerDecoder(jis: InputStream) = jis
+
+ override def wrapLimitingStream(jis: java.io.InputStream, state: PState) = {
+ val layerLengthInBytes: Int = layerLengthEv.evaluate(state).toInt
+ val ba = new Array[Byte](layerLengthInBytes)
+ try
+ IOUtils.readFully(jis, ba)
+ catch {
+ case eof: EOFException =>
+ throw DataInputStream.NotEnoughDataException(ba.length * 8)
+ }
+ // this stream will be used by the parse when it parses the
+ // schema contents of the layer. We have already consumed the bytes
+ // from the original input stream, and saved those in the byte array
+ // to enable the computation of the checkDigit.
+ val layerStream = new ByteArrayInputStream(ba)
+ val str = new String(ba, charset) // always does replacement on decode
errors.
+ val checkDigit = computeCheckDigit(str)
+ state.setVariable(checkDigitVRD, checkDigit, srd) // assign to result
variable
+ layerStream
Review comment:
If the checksum was over a large thing, like a whole file, then one
would want to use a layered stream that does the checksumming as part of the
parser reading the data stream. That would avoid the need to buffer stuff.
That's what for example the gzip and line-folding layers do.
In this case this is IP packets, by definition small. So just grabbing it
into a buffer seemed simplest.
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/unparsers/UState.scala
##########
@@ -74,6 +76,21 @@ abstract class UState(
extends ParseOrUnparseState(vbox, diagnosticsArg, dataProcArg, tunable)
with Cursor[InfosetAccessor] with ThrowsSDE with SavesErrorsAndWarnings {
+ final def setVariable(vrd: VariableRuntimeData, newValue:
DataValuePrimitive, referringContext: ThrowsSDE) =
+ vbox.vmap.setVariable(vrd, newValue, referringContext, this)
+
+ /**
+ * For unparsing, this throws a RetryableException in the case where the
variable cannot (yet) be read.
+ *
+ * @param vrd Identifies the variable to read.
+ * @param referringContext Where to place blame if there is an error.
+ * @return The data value of the variable, or throws exceptions if there is
no value.
+ */
+ final def getVariable(vrd: VariableRuntimeData, referringContext:
ThrowsSDE): DataValuePrimitive =
Review comment:
I will actually look at how expressions that read variables do it. They
are obviously not calling this getter, but perhaps could which would solve the
test coverage.
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/layers/LayerTransformer.scala
##########
@@ -200,6 +214,54 @@ abstract class LayerTransformer()
// println("DIS hex dump = " + str)
// ???
// }
+
+ /**
+ * Enables the layer transformer to take necessary state-maniuplating
actions before
+ * the parsing of a layer starts.
+ *
+ * Examples might be setting variable values.
+ *
+ * At the time this is called, the state contains the layered I/O stream
already.
+ * @param s The parser state, which may be modified.
+ */
+ def startLayerForParse(s: PState):Unit
+
+ /**
+ * Enables the layer transformer to take necessary state-maniuplating
actions after
+ * the parsing of a layer ends.
+ *
+ * Examples might be setting variable values.
+ *
+ * At the time this is called, the state still contains the layered I/O
stream.
+ * @param s The parser state, which may be modified.
+ */
+ def endLayerForParse(s: PState): Unit
+
+ /**
+ * Enables the layer transformer to take necessary state-maniuplating
actions before
+ * the unparsing of a layer starts.
+ *
+ * Examples might be setting variable values.
+ *
+ * Note that if variables without values are read by this method, then this
must create
+ * suspended calculations that clone the necessary state, which are
evaluated later.
+ *
+ * At the time this is called, the state still contains the layered I/O
stream.
+ * @param s The unparser state, which may be modified.
+ */
+ def startLayerForUnparse(s: UState): Unit
+
+ /**
+ * Enables the layer transformer to take necessary state-maniuplating
actions after
+ * the unparsing of a layer ends.
+ *
+ * Note that if variables without values are read by this method, then this
must create
+ * suspended calculations that clone the necessary state, which are
evaluated later.
Review comment:
This is a very good point. I think the plugin API really needs to
abstract away this stuff so that a framework that creates suspensions is being
used, and drives the plug in methods.
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/PState.scala
##########
@@ -327,11 +328,17 @@ final class PState private (
this.infoset = newParent
}
- def setVariable(vrd: VariableRuntimeData, newValue: DataValuePrimitive,
referringContext: VariableRuntimeData): Unit = {
+ def setVariable(vrd: VariableRuntimeData, newValue: DataValuePrimitive,
referringContext: ThrowsSDE): Unit = {
variableMap.setVariable(vrd, newValue, referringContext, this)
changedVariablesStack.top += vrd.globalQName
}
+ def getVariable(vrd: VariableRuntimeData, referringContext: ThrowsSDE):
DataValuePrimitive = {
+ val res = variableMap.readVariable(vrd, referringContext, this)
+ changedVariablesStack.top += vrd.globalQName
Review comment:
This was needed in the first version of IPv4 which had two transforms
where the first one assigned a variable that had to be read (gotten) by the
second.
When I simplified IPv4 to a single transform, this was no longer needed. And
check-digits doesn't need it either.
Neither one reads any variable. Just writes them.
But there's nothing stopping a layer transform from wanting to read a
variable, e.g, to get a parameter to the algorithm, e.g, a mode flag like
compression algorithm to use, or variant of line-folding to use, etc.
So I think we just need an example that exercises this call.
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/EvLayering.scala
##########
@@ -70,16 +70,27 @@ final class LayerTransformerEv(
layerTransformEv: LayerTransformEv,
maybeLayerCharsetEv: Maybe[LayerCharsetEv],
maybeLayerLengthKind: Maybe[LayerLengthKind],
- maybeLayerLengthInBytesEv: Maybe[LayerLengthInBytesEv],
+ maybeLayerLengthEv: Maybe[LayerLengthEv],
maybeLayerLengthUnits: Maybe[LayerLengthUnits],
maybeLayerBoundaryMarkEv: Maybe[LayerBoundaryMarkEv],
- ci: DPathCompileInfo)
- extends Evaluatable[LayerTransformer](ci)
+ srd: SequenceRuntimeData)
+ extends Evaluatable[LayerTransformer](srd.dpathCompileInfo)
with NoCacheEvaluatable[LayerTransformer] {
+ /**
+ * A layer transformer is never a constant. It always must be evaluated to
get
+ * a new one.
+ *
+ * This prevents it being cached/stored as part of the state of the data
processor
+ * where it would be shared across threads.
+ *
+ * Otherwise layer transformers couldn't be stateful. But they often need to
be stateful.
+ */
+ override protected def isNeverConstant = true
Review comment:
Good point. I'll arrange for NoCacheExecutable to have that behavior,
update its scaladoc accordingly, and remove this flag.
##########
File path:
daffodil-propgen/src/main/resources/org/apache/daffodil/xsd/DFDL_part2_attributes.xsd
##########
@@ -1069,18 +1037,5 @@
<xsd:attributeGroup ref="dfdlx:SimpleTypeValueCalcAGQualified" />
<xsd:attributeGroup ref="dfdlx:ObjectKindAGQualified" />
</xsd:attributeGroup>
-
- <xsd:attributeGroup name="LayeringAGQualified">
- <xsd:attribute form="qualified" name="layerTransform"
- type="dfdl:LayerTransformType_Or_DFDLExpression" />
- <xsd:attribute form="qualified" name="layerEncoding"
- type="dfdl:EncodingEnum_Or_DFDLExpression" />
- <xsd:attribute form="qualified" name="layerLengthKind"
type="dfdl:LayerLengthKindEnum" />
- <xsd:attribute form="qualified" name="layerLength"
- type="dfdl:DFDLNonNegativeInteger_Or_DFDLExpression" />
- <xsd:attribute form="qualified" name="layerLengthUnits"
type="dfdl:LayerLengthUnitsEnum" />
- <xsd:attribute form="qualified" name="layerBoundaryMark"
- type="dfdl:ListOfDFDLStringLiteral_Or_DFDLExpression" />
- </xsd:attributeGroup>
Review comment:
Probably better to just make a specific JIRA ticket for this kind of a
change.
Created https://issues.apache.org/jira/browse/DAFFODIL-2564
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/layers/LayerTransformer.scala
##########
@@ -173,37 +187,33 @@ abstract class LayerTransformer()
//
}
- // These were very useful for debugging. Note that they stop with a ???
- // error. That's because dumping streams changes their state.
- //
- // Keeping these around commented out, for now. While this feature is still
- // new and may need further debugging.
- //
- // def dumpLayer(is: InputStream) {
- // val str = Stream.continually(is.read).takeWhile(_ !=
-1).map(_.toChar).mkString
- // println("dump length " + str.length + " = " + str)
- // ???
- // }
- //
- // def hexDumpLayer(is: InputStream) {
- // val str = hexify(is)
- // println("hex dump = " + str)
- // ???
- // }
- //
- // def hexify(is: InputStream): String =
- // Stream.continually(is.read).takeWhile(_ != -1).map(x =>
"%02x".format(x.toInt)).mkString(" ")
- //
- // def hexDumpLayer(is: InputSourceDataInputStream, finfo: FormatInfo) {
- // val jis = new JavaIOInputStream(is, finfo)
- // val str = hexify(jis)
- // println("DIS hex dump = " + str)
- // ???
- // }
+ /**
+ * Enables the layer transformer to take necessary state-maniuplating
actions before
+ * the parsing of a layer starts.
+ *
+ * Examples might be setting variable values.
+ *
+ * At the time this is called, the state contains the layered I/O stream
already.
+ * @param s The parser state, which may be modified.
+ */
+ def startLayerForParse(s: PState):Unit = () // nothing by default
Review comment:
I think it just works that you only need a start function for the parse, to
capture the layer data before regular parsing parses it. You don't have to
compute the checksum at that point, but I see no reason not to.
For unparsing, it's the opposite. You need to first unparse and capture the
outbound data, and only after that can you recompute the checksum.
I originally created start/end calls for each of parse/unparse, then
discovered that I didn't need end for parse, nor start for unparse.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]