Re: [PR] 4th (and final) cut of New Layer API & Improved Implementation [daffodil]

via GitHub Tue, 19 Mar 2024 14:11:02 -0700


mbeckerle commented on code in PR #1187:
URL: https://github.com/apache/daffodil/pull/1187#discussion_r1531093450



##########
daffodil-runtime1-layers/src/main/scala/org/apache/daffodil/layers/runtime1/BoundaryMarkLayer.scala:
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.layers.runtime1
+
+import java.io.InputStream
+import java.io.OutputStream
+import java.nio.charset.Charset
+
+import org.apache.daffodil.io.BoundaryMarkInsertingJavaOutputStream
+import org.apache.daffodil.io.BoundaryMarkLimitingInputStream
+import org.apache.daffodil.runtime1.layers.api.Layer
+
+/**
+ * A layer which isolates text data by way of a boundary mark string.
+ * This is like a terminating delimiter of a DFDL element or sequence/choice 
group,
+ * except that there is no escaping mechanism, so the data cannot in any way
+ * contain the boundary mark string, and elements within the layer can have
+ * any dfdl:lengthKind, but this does not affect the search for the boundary 
mark.
+ *
+ * For example, when using an element with dfdl:lengthKind="delimited" and a 
dfdl:terminator="END",
+ * if that element contains a child element with dfdl:lengthKind="explicit", 
then the
+ * search for the "END" terminator is suspended for the length of the child 
element, and
+ * that search resumes after the child element's length has been parsed.
+ *
+ * In contrast to this, if a boundary mark layer is used with the boundaryMark 
variable
+ * bound to "END", then the data stream is decoded as characters in the 
charset encoding
+ * given by the layerEncoding variable, and the layer continues until the 
"END" is found.
+ * The dfdl:lengthKind of any child element enclosed within the layer, or even 
the lengths
+ * of other layers found within the scope of this boundary mark layer are not 
considered and do not
+ * disrupt the search for the boundary mark string.
+ *
+ * This layer does not populate any DFDL variables with results.
+ *
+ * This layer defines two DFDL variables which provide required parameters.
+ */
+final class BoundaryMarkLayer
+  extends Layer("boundaryMark", "urn:org.apache.daffodil.layers.boundaryMark") 
{
+
+  private var boundaryMark: String = _
+  private var layerEncoding: String = _
+
+  private val maxBoundaryMarkLength: Int = Short.MaxValue
+
+  /**
+   * @param boundaryMark  a string which is the boundary marker. Searching for 
this string is
+   *                      done without any notion of escaping. When parsing 
the data in the
+   *                      layer is up to but not including that of the 
boundary mark string,
+   *                      which is removed from the data-stream on parsing, 
and inserted into
+   *                      the data stream on unparsing.
+   * @param layerEncoding a string which is the name of the character set 
encoding used to
+   *                      decode characters during the search for the boundary 
mark, and used
+   *                      to encode characters when inserting the boundary 
mark when unparsing.
+   */
+  private[layers] def setLayerVariableParameters(
+    boundaryMark: String,
+    layerEncoding: String,
+  ): Unit = {
+    this.boundaryMark = boundaryMark
+    if (boundaryMark.isEmpty)
+      processingError("The boundaryMark variable value may not be empty 
string.")
+    if (boundaryMark.length > maxBoundaryMarkLength)
+      processingError(
+        s"The boundaryMark string length may not be greater than the limit: 
$maxBoundaryMarkLength",
+      )

Review Comment:
   Yes. If it's about a variable that's a parameter to the layer, and there is 
something wrong that variable's value, then it should be a PE. Because 
variables can all come from expressions, which can all come from the data, 
which is subject to being garbage due to speculative parsing. 
   
   This means there is not a lot of play for RSDE in layers. Users can choose 
to call the RSDE method from anywhere in their layer, but doing so means you 
cannot use the layer in speculative parsing situations. 



##########
daffodil-runtime1-unparser/src/main/scala/org/apache/daffodil/unparsers/runtime1/LayeredSequenceUnparser.scala:
##########
@@ -67,36 +68,52 @@ class LayeredSequenceUnparser(
     // create a new DOS where unparsers following this layer will unparse
     val layerFollowingDOS = layerUnderlyingDOS.addBuffered()
 
-    // New layerDOS is where the layer will unparse into. Ultimately anything 
written
-    // to layerDOS ends up, post transform, in layerUnderlyingDOS
-    val layerDOS = layerTransformer.addLayer(layerUnderlyingDOS, state, 
formatInfoPre)
-
-    // unparse the layer body into layerDOS
-    state.dataOutputStream = layerDOS
-    super.unparse(state)
-    layerTransformer.endLayerForUnparse(state)
-
-    // now we're done unparsing the layer, so finalize the last DOS in the
-    // chain. Note that there might be suspensions so some parts of the
-    // layerDOS chain may not be finished. When those suspensions are all
-    // finished, the layerDOS content will be written to the
-    // layerUnderlyingDOS, which will subsequently be finished. Like above, it
-    // is important to pass an immutable UState into the setFinished function
-    // because that state is stored in the DOS (as finishedFormatInfo) and its
-    // use may be delayed to after the actual UState has been changed. The
-    // UState has almost certainly changed since the last cloneForSuspension
-    // call, so we need a new clone to finish this DOS.
-    val layerDOSLast = layerDOS.lastInChain
-    val formatInfoPost = 
state.asInstanceOf[UStateMain].cloneForSuspension(layerDOSLast)
-    layerDOSLast.setFinished(formatInfoPost)
-
-    // clean up resources - note however, that due to suspensions, the whole
-    // layer stack is potentially still needed, so not clear what can be
-    // cleaned up at this point.
-    layerTransformer.removeLayer(layerDOS, state)
-
-    // reset the state so subsequent unparsers write to the following DOS
-    state.dataOutputStream = layerFollowingDOS
+    try {
+      val layerDriver = LayerDriver(formatInfoPre, ctxt.layerRuntimeData)
+
+      // New layerDOS is where the layer will unparse into. Ultimately 
anything written
+      // to layerDOS ends up, post transform, in layerUnderlyingDOS
+      val layerDOS = layerDriver.addOutputLayer(layerUnderlyingDOS)
+
+      // unparse the layer body into layerDOS
+      state.dataOutputStream = layerDOS
+      super.unparse(state)
+      // now we're done unparsing the layer recursively.
+      // While doing that unparsing, the data output stream may have been 
split, so the
+      // DOS in the state may no longer be the layerDOS.
+      //
+      // However, it is when whatever DOS is in the state at this point, that, 
when that
+      // DOS is consolidated and written out, that is when the layer is 
finished
+      // and the wrap-up of the layer (such as writing output variables) can 
occur.
+      //
+      val endOfLayerUnparseDOS = state.dataOutputStream
+      val formatInfoPost =
+        state.asInstanceOf[UStateMain].cloneForSuspension(endOfLayerUnparseDOS)
+
+      // setFinished on this end-of-layer-unparse data-output-stream  ensures
+      // that the layerDOS gets close() called on it.
+      endOfLayerUnparseDOS.setFinished(formatInfoPost)
+
+      // clean up resources - note however, that due to suspensions, the whole
+      // layer stack is potentially still needed, so
+      // nothing can be cleaned up at this point.
+    } catch {
+      case u: UnsuppressableException =>
+        throw u
+      case lre: LayerRuntimeException =>
+        throw lre
+      case re: RuntimeException =>
+        throw new LayerRuntimeException(re)
+      case pe: UnparseError =>
+        throw pe
+      case sde: RuntimeSchemaDefinitionError =>
+        throw sde
+      case e: Exception =>
+        state.toss(state.toProcessingError(new LayerUnexpectedException(e)))

Review Comment:
   I don't think so. A layer can fail in many ways that are hard to anticipate 
due to data problems. Data problems can always be due to speculative parsing 
down the wrong branch. 
   
   Users can choose to catch errors and make things RSDE if they want, but if 
we impose that most Exceptions are fatal then users will just have to wrap lots 
of try/catches around their code to anticipate and convert all the ordinary 
"things that can go wrong due to bad data" and convert them into processing 
errors. 
   
   I've taken one consistent position here which I think enables good 
composition properties for schemas that use layers. Exceptions become PE. 
RuntimeExceptions are fatal, as are Error and Daffodil inteernal errors which 
are UnsuppressableException/Abort. 
   
   Sometimes layer authors will want to catch specific RuntimeExceptions and 
convert them to PE. Our built in layers do this for code that converts encoding 
names to charsets. If that fails, it's a PE, though the charset findByName 
method throws RuntimeException. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] 4th (and final) cut of New Layer API & Improved Implementation [daffodil]

Reply via email to