Re: [PR] Rebased - so github testing will run - New Layer API & Improved Implementation [daffodil]

via GitHub Tue, 02 Apr 2024 06:50:59 -0700


stevedlawrence commented on code in PR #1191:
URL: https://github.com/apache/daffodil/pull/1191#discussion_r1547931916



##########
daffodil-runtime1-layers/src/main/scala/org/apache/daffodil/layers/runtime1/FixedLengthLayer.scala:
##########
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.layers.runtime1
+
+import java.io.ByteArrayInputStream
+import java.io.ByteArrayOutputStream
+import java.io.InputStream
+import java.io.OutputStream
+import java.nio.ByteBuffer
+
+import org.apache.daffodil.lib.exceptions.Assert
+import org.apache.daffodil.runtime1.layers.LayerNotEnoughDataException
+import org.apache.daffodil.runtime1.layers.api.Layer
+
+import org.apache.commons.io.IOUtils
+
+/**
+ * Suitable only for small sections of data, not large data streams or whole 
files.
+ * See the maxFixedLength value defined herein for the maximum.
+ *
+ * The entire fixed length region of the data will be pulled into a byte 
buffer in memory.
+ *
+ * TODO: Someday, enhance to make this streaming.
+ *
+ * One DFDL Variable is a parameter
+ *   - fixedLength - an unsignedInt giving the fixed length of this layer.
+ *   This length is enforced on both parsing and unparsing the layer.
+ * There are no output/result DFDL variables from this layer.
+ */
+final class FixedLengthLayer
+  extends Layer("fixedLength", "urn:org.apache.daffodil.layers.fixedLength") {
+
+  private var fixedLength: Int = -1
+
+  private def maxFixedLength = Short.MaxValue
+
+  /**
+   * Captures the fixedLength DFDL variable value and saves it.
+   *
+   * Also validates whether it is in acceptable range. Because the fixedLength 
variable
+   * can be populated from an expression that consumes input data, it's 
possible for the
+   * fixedLength to be out of range as a result of speculative parsing down
+   * an incorrect path. Hence, it is a processing error, not a schema 
definition error
+   * if the fixedLength variable value is out of range.
+   *
+   * @param fixedLength matches the name of the DFDL variable which will 
supply this value.
+   */
+  private[layers] def setLayerVariableParameters(fixedLength: Long): Unit = {
+    this.fixedLength = fixedLength.toInt
+    Assert.invariant(fixedLength >= 0) // variable is unsignedInt, so this 
can't be negative
+    if (fixedLength > maxFixedLength)
+      processingError(
+        s"fixedLength value of $fixedLength is above the maximum of 
$maxFixedLength.",
+      )
+  }
+
+  override def wrapLayerInput(jis: InputStream): InputStream = {
+    new FixedLengthInputStream(jis)

Review Comment:
   Note, I just came across Apache CommonsIO `BoundedInputStream` which does 
basically the same thing as this but streams data. So it fixes the "make this 
streaming" TODO, at least for the input part. It think it would mean the `nRead 
< fixedLength` would need to be done at close though, maybe something like this:
   
   ```scala
   new BoundedInputStream(jis, fixedLength) {
     override def close(): Unit = {
       if (getCount() < fixedLength) {
         processingError(new LayerNotEnoughDataException(fixedLength, getCount))
       }
       super.close()
     }
   }
   ```
   
   Note that this changes the PE to occur at the end of the layer instead of at 
the beginning. Could that affect backtracking?
   
   Which raises a question to me, if a layer throws a PE how does backtracking 
work? Does it backtrack to the very beginning of the layer? Or is the PE just 
handled by whatever is currently parsing the layer data? Which means a parser 
could potentially ignore a layer PE by just backtracking a little and keep 
reading on?
   
   It  feels like if a layer creates a PE, we want to backtrack all the way 
back to before that layer started, since that implies the data wasn't valid 
layer data. Is how it works?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Rebased - so github testing will run - New Layer API & Improved Implementation [daffodil]

Reply via email to