stevedlawrence commented on code in PR #1165: URL: https://github.com/apache/daffodil/pull/1165#discussion_r1502639291
########## daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/layers/api/Layer.scala: ########## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.daffodil.runtime1.layers.api + +import java.io.IOException +import java.io.InputStream +import java.io.OutputStream +import java.nio.charset.Charset +import java.util.Optional + +import org.apache.daffodil.runtime1.api.DFDLPrimType +import org.apache.daffodil.runtime1.layers.LayerException + +// TODO: Convert this whole file to Java ?? + +/** + * Descriptor of the DFDL variables used by the layer. + * + * The names and types must match the dfdl:defineVariable definitions used in + * an included/imported DFDL component schema which provides the layer definition for use + * by other schemas. + * + * The first such variable is sometimes distinguished as the variable written to + * as the single unique value of the layer, such as when the layer computes a + * checksum. This is, however, just a convention used by various checksum layers + * and classes/traits that support writing checksum layers. + * + * @param prefix preferred namespace prefix for the namespace of the variables + * @param namespace namespace of the variables + * @param variables list of pairs, each is the name (an NCName, that is without any + * namespace prefix) and type of a variable + */ +final case class LayerVariables( + prefix: String, + namespace: String, + variables: java.util.List[(String, DFDLPrimType)], +) + +/** + * This is the primary API class for writing layers. + * + * All layers are derived from this class, and must have no-args default constructors. + * + * Derived classes will be dynamically loaded by Java's SPI system. + * The names of concrete classes derived from Layer are listed in a resources/M.services file Review Comment: Is this supposed to be `resources/META-INF/services` or is this a short name for it? ########## daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/layers/api/Layer.scala: ########## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.daffodil.runtime1.layers.api + +import java.io.IOException +import java.io.InputStream +import java.io.OutputStream +import java.nio.charset.Charset +import java.util.Optional + +import org.apache.daffodil.runtime1.api.DFDLPrimType +import org.apache.daffodil.runtime1.layers.LayerException + +// TODO: Convert this whole file to Java ?? + +/** + * Descriptor of the DFDL variables used by the layer. + * + * The names and types must match the dfdl:defineVariable definitions used in + * an included/imported DFDL component schema which provides the layer definition for use + * by other schemas. + * + * The first such variable is sometimes distinguished as the variable written to + * as the single unique value of the layer, such as when the layer computes a + * checksum. This is, however, just a convention used by various checksum layers + * and classes/traits that support writing checksum layers. + * + * @param prefix preferred namespace prefix for the namespace of the variables + * @param namespace namespace of the variables + * @param variables list of pairs, each is the name (an NCName, that is without any + * namespace prefix) and type of a variable + */ +final case class LayerVariables( + prefix: String, + namespace: String, + variables: java.util.List[(String, DFDLPrimType)], Review Comment: What if a layer wants variables from different namespaces? For example, maybe a layers wants to use the `dfdl:byteOrder` variable and a variable in their own namespace? Or do we want the restriction that a layer can only use variables in it's own namespace, and if a user wants to pass in something like dfdl:byteOrder they would need to a setVariable that sets its own variable to the value of $dfdl:byteOrder? ########## daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/layers/api/JLayerLengthUnits.java: ########## Review Comment: Seems reasonable, the simpler the better. The below enum also has "characters", I assume that isn't need either? ########## daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/layers/api/JLayerLengthKind.java: ########## @@ -0,0 +1,21 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.daffodil.runtime1.layers.api; + +public enum JLayerLengthKind { Review Comment: Agreed, I think we should be able to drop the 'J' without a problem? Or are there possible naming conflicts? ########## daffodil-runtime1-layers/src/main/scala/org/apache/daffodil/layers/runtime1/LineFoldedTransformer.scala: ########## @@ -75,157 +73,62 @@ import org.apache.daffodil.runtime1.processors.ParseOrUnparseState * For MIME, the maximum line length is 76. */ -sealed abstract class LineFoldedLayerCompiler(mode: LineFoldMode) - extends LayerCompiler(mode.transformName) { - - override def compileLayer( - layerCompileInfo: LayerCompileInfo, - ): LineFoldedTransformerFactory = { - - layerCompileInfo.optLayerLengthKind match { - case Some(LayerLengthKind.BoundaryMark) => - layerCompileInfo.SDEUnless( - layerCompileInfo.optLayerBoundaryMarkOptConstantValue.isDefined, - "Property dfdlx:layerBoundaryMark was not defined.", - ) - case Some(LayerLengthKind.Implicit) => // ok - case Some(other) => - layerCompileInfo.SDE( - s"Property dfdlx:layerLengthKind can only be 'implicit' or 'boundaryMark', but was '$other'.", - ) - case None => - layerCompileInfo.SDE( - s"Property dfdlx:layerLengthKind must be 'implicit' or 'boundaryMark'.", - ) +sealed abstract class LineFoldedLayerBase(mode: LineFoldMode) + extends Layer( + layerName = mode.dfdlName, + supportedLayerLengthKinds = + Seq(JLayerLengthKind.BoundaryMark, JLayerLengthKind.Implicit).asJava, + supportedLayerLengthUnits = Seq().asJava, + isRequiredLayerEncoding = false, + optLayerVariables = Optional.empty(), + ) { + + /** + * The layer limiter needs to be different for boundaryMark and implicit cases. + * @return a LayerLimiter or Optional.empty to indicate the standard layer limiter implementation should be used. + */ + override def layerLimiter(layerPropertyInfo: LayerPropertyInfo): Optional[LayerLimiter] = + layerPropertyInfo.layerLengthKind() match { + case JLayerLengthKind.Implicit => + Optional.empty() // use the default implicit layer length implementation. + case JLayerLengthKind.BoundaryMark => + Optional.of(new LineFoldedLayerBoundaryMarkLimiter(this)) + case JLayerLengthKind.Explicit => + Assert.invariantFailed("layerLengthKind 'explicit' not allowed.") } Review Comment: If we removed the idea of a limiting vs encode/decode layers, and it's all just a layer with the same API, implicit layer length kind would look like ```xml <sequence dfdlx:layer="lineFolded_IMF"> ... </sequence> ``` And the boundary mark layer would look like this: ```xml <sequence dfdlx:layer="lineFolded_boundaryMark_limit"> <sequence dfdlx:layer="lineFolded_IMF"> ... </sequence> </sequence> ``` So schemas become a bit more complicated, but maybe with a simpler API? This layerLimiter stuff goes away. ########## daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/layers/api/Layer.scala: ########## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.daffodil.runtime1.layers.api + +import java.io.IOException +import java.io.InputStream +import java.io.OutputStream +import java.nio.charset.Charset +import java.util.Optional + +import org.apache.daffodil.runtime1.api.DFDLPrimType +import org.apache.daffodil.runtime1.layers.LayerException + +// TODO: Convert this whole file to Java ?? + +/** + * Descriptor of the DFDL variables used by the layer. + * + * The names and types must match the dfdl:defineVariable definitions used in + * an included/imported DFDL component schema which provides the layer definition for use + * by other schemas. + * + * The first such variable is sometimes distinguished as the variable written to + * as the single unique value of the layer, such as when the layer computes a + * checksum. This is, however, just a convention used by various checksum layers + * and classes/traits that support writing checksum layers. + * + * @param prefix preferred namespace prefix for the namespace of the variables + * @param namespace namespace of the variables + * @param variables list of pairs, each is the name (an NCName, that is without any + * namespace prefix) and type of a variable + */ +final case class LayerVariables( + prefix: String, + namespace: String, + variables: java.util.List[(String, DFDLPrimType)], +) + +/** + * This is the primary API class for writing layers. + * + * All layers are derived from this class, and must have no-args default constructors. + * + * Derived classes will be dynamically loaded by Java's SPI system. + * The names of concrete classes derived from Layer are listed in a resources/M.services file + * so that they can be found and dynamically loaded. + * + * The SPI creates an instance the class of which is used as a factory to create the + * instances actually used by the Daffodil runtime. Compilation of the static information about + * the layer occurs only once and is then shared by all runtime instances. + * + * Instances of derived layer classes can be stateful. They are private to threads, and each time a layer + * is encountered during parse/unparse, an instance is created for that situation. + * + * All the static information about the layer is provided in the arguments. + * + * The rest of the Layer class implements the + * layer decode/encode logic, which is done as part of deriving one's Layer class from the + * Layer base class. + * + * About variables: Layer logic may read and write variables. Variables being read are parameters to + * the layer algorithm. Variables being written are outputs (such as checksums) from the layer algorithm. + * Variables being written must be undefined, since variables in DFDL are single-assignment. + * Variables being read must be defined before being read by the layer, and this is true for both + * parsing and unparsing. When unparsing, variables being read cannot be forward-referencing to parts + * of the DFDL infoset that have not yet been unparsed. + * + * @param layerName the name that will appear in the DFDL schema to identify the layer + * @param supportedLayerLengthKinds list of the layer length kinds the layer supports + * @param supportedLayerLengthUnits list of the layer length units the layer supports + * @param isRequiredLayerEncoding true if the layer is textual and so needs the layerEncoding property + * @param optLayerVariables a LayerVariables structure describing the variables the layer accesses + */ +abstract class Layer( + val layerName: String, + val supportedLayerLengthKinds: java.util.List[JLayerLengthKind], + val supportedLayerLengthUnits: java.util.List[JLayerLengthUnits], + val isRequiredLayerEncoding: Boolean, Review Comment: Is layer encoding needed anymore? It looks like CheckDigit and AIS are the only ones that set this to true. I don't see AIS using the layerCharset. CheckDigit layer uses it, but maybe we can get away with it just always being ascii? Or alternatively, maybe this this is just another variable? For example, checkDigit already has a `checkDigitParams` variable, maybe an additional variable could be used to specify encoding? ########## daffodil-runtime1-layers/src/main/scala/org/apache/daffodil/layers/runtime1/GZipLayer.java: ########## @@ -15,77 +15,39 @@ * limitations under the License. */ -package org.apache.daffodil.layers.runtime1 - -import org.apache.daffodil.io.ExplicitLengthLimitingStream -import org.apache.daffodil.lib.schema.annotation.props.gen.LayerLengthKind -import org.apache.daffodil.runtime1.layers._ -import org.apache.daffodil.runtime1.processors.ParseOrUnparseState - -final class GZIPLayerCompiler extends LayerCompiler("gzip") { - - override def compileLayer(layerCompileInfo: LayerCompileInfo): GZIPTransformerFactory = { - - layerCompileInfo.SDEUnless( - layerCompileInfo.optLayerLengthKind.isEmpty || - (layerCompileInfo.optLayerLengthKind.get eq LayerLengthKind.Explicit), - "Only dfdlx:layerLengthKind 'explicit' is supported, but '%s' was specified", - layerCompileInfo.optLayerLengthKind.get.toString, - ) - - val xformer = new GZIPTransformerFactory(name) - xformer - } -} - -final class GZIPTransformerFactory(name: String) extends LayerTransformerFactory(name) { - - override def newInstance(layerRuntimeInfo: LayerRuntimeInfo) = { - val xformer = new GZIPTransformer(name, layerRuntimeInfo) - xformer +package org.apache.daffodil.layers.runtime1; + +import org.apache.daffodil.runtime1.layers.api.JLayerLengthKind; +import org.apache.daffodil.runtime1.layers.api.JLayerLengthUnits; +import org.apache.daffodil.runtime1.layers.api.Layer; +import org.apache.daffodil.runtime1.layers.api.LayerRuntime; + +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.util.Arrays; +import java.util.Optional; + +public class GZipLayer extends Layer { + + public GZipLayer() { + super( + "gzip", + Arrays.asList(JLayerLengthKind.Explicit, JLayerLengthKind.Implicit), + Arrays.asList(JLayerLengthUnits.Bytes), + false, + Optional.empty() + ); } -} - -class GZIPTransformer(name: String, layerRuntimeInfo: LayerRuntimeInfo) - extends LayerTransformer(name, layerRuntimeInfo) { - override def wrapLayerDecoder(jis: java.io.InputStream) = { - val s = new java.util.zip.GZIPInputStream(jis) - s + @Override + public InputStream wrapLayerDecoder(InputStream jis, LayerRuntime lrd) throws IOException { + return new java.util.zip.GZIPInputStream(jis); } - override def wrapLimitingStream(state: ParseOrUnparseState, jis: java.io.InputStream) = { - val layerLengthInBytes = layerRuntimeInfo.optLayerLength(state).get - val s = new ExplicitLengthLimitingStream(jis, layerLengthInBytes) - s - } - - override protected def wrapLayerEncoder(jos: java.io.OutputStream): java.io.OutputStream = { - val s = GZIPFixedOutputStream(jos) - s - } - - override protected def wrapLimitingStream( - state: ParseOrUnparseState, - jis: java.io.OutputStream, - ) = { - jis // just return jis. The way the length will be used/stored is by way of - // taking the content length of the enclosing element. That will measure the - // length relative to the "ultimate" data output stream. - } -} - -object GZIPFixedOutputStream { - - private val fixIsNeeded = !scala.util.Properties.isJavaAtLeast("16") - - /** - * Create a GZIPOutputStream that, if necessary, proxies writes through an - * OutputStream that fixes inconsistencies between Java versions - */ - def apply(os: java.io.OutputStream) = { - val fixedOS = if (fixIsNeeded) new GZIPFixedOutputStream(os) else os - new java.util.zip.GZIPOutputStream(fixedOS) + @Override + public OutputStream wrapLayerEncoder(OutputStream jos, LayerRuntime lrd) throws IOException { + return GZIPFixedOutputStream.apply(jos); Review Comment: I feel like the concept of `apply` is relatively rare in java. Since this might be used as a reference for implementing layers in Java, should we remove it to avoid any confusion and just have the layer do the logic and create the GZIPFixedOutputStream or not depending on java version? ########## daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/layers/api/Layer.scala: ########## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.daffodil.runtime1.layers.api + +import java.io.IOException +import java.io.InputStream +import java.io.OutputStream +import java.nio.charset.Charset +import java.util.Optional + +import org.apache.daffodil.runtime1.api.DFDLPrimType +import org.apache.daffodil.runtime1.layers.LayerException + +// TODO: Convert this whole file to Java ?? + +/** + * Descriptor of the DFDL variables used by the layer. + * + * The names and types must match the dfdl:defineVariable definitions used in + * an included/imported DFDL component schema which provides the layer definition for use + * by other schemas. + * + * The first such variable is sometimes distinguished as the variable written to + * as the single unique value of the layer, such as when the layer computes a + * checksum. This is, however, just a convention used by various checksum layers + * and classes/traits that support writing checksum layers. + * + * @param prefix preferred namespace prefix for the namespace of the variables + * @param namespace namespace of the variables + * @param variables list of pairs, each is the name (an NCName, that is without any + * namespace prefix) and type of a variable + */ +final case class LayerVariables( + prefix: String, + namespace: String, + variables: java.util.List[(String, DFDLPrimType)], +) + +/** + * This is the primary API class for writing layers. + * + * All layers are derived from this class, and must have no-args default constructors. + * + * Derived classes will be dynamically loaded by Java's SPI system. + * The names of concrete classes derived from Layer are listed in a resources/M.services file + * so that they can be found and dynamically loaded. + * + * The SPI creates an instance the class of which is used as a factory to create the + * instances actually used by the Daffodil runtime. Compilation of the static information about + * the layer occurs only once and is then shared by all runtime instances. + * + * Instances of derived layer classes can be stateful. They are private to threads, and each time a layer + * is encountered during parse/unparse, an instance is created for that situation. Review Comment: Is it worth adding a note that instances of layers should not share any mutable state? For example, I could imagine an instance having some mutable singleton state object that would work fine for a single thread but would break with multiple threads. ########## daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/layers/api/Layer.scala: ########## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.daffodil.runtime1.layers.api + +import java.io.IOException +import java.io.InputStream +import java.io.OutputStream +import java.nio.charset.Charset +import java.util.Optional + +import org.apache.daffodil.runtime1.api.DFDLPrimType +import org.apache.daffodil.runtime1.layers.LayerException + +// TODO: Convert this whole file to Java ?? Review Comment: Agreed, it would ensure this works well with Java. ########## daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/layers/LayerException.scala: ########## @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.daffodil.runtime1.layers + +abstract class LayerException(msg: String, cause: Throwable) extends Exception(msg, cause) { + def this(msg: String) = this(msg, null) + def this(cause: Throwable) = this(null, cause) +} + +class LayerCompilerException(msg: String, cause: Throwable) extends LayerException(msg, cause) { Review Comment: I think I lean towards the former--we have exceptions that authors are expected to throw and Daffodil handles in a specific way. Any other exceptions are unexpected and should be considered fatal/an error. I think this is what our UDF API doses. UDF's are expected to only throw a UserDefinedFunctionProcessingError. If they throw any other exceptions, they are converted o a UserDefinedFunctionaFatalErrorException, which bubbles all the way up. In our CLI, we catch this and output information so users can fix their UDF. ########## daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/layers/api/Layer.scala: ########## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.daffodil.runtime1.layers.api + +import java.io.IOException +import java.io.InputStream +import java.io.OutputStream +import java.nio.charset.Charset +import java.util.Optional + +import org.apache.daffodil.runtime1.api.DFDLPrimType +import org.apache.daffodil.runtime1.layers.LayerException + +// TODO: Convert this whole file to Java ?? + +/** + * Descriptor of the DFDL variables used by the layer. + * + * The names and types must match the dfdl:defineVariable definitions used in + * an included/imported DFDL component schema which provides the layer definition for use + * by other schemas. + * + * The first such variable is sometimes distinguished as the variable written to + * as the single unique value of the layer, such as when the layer computes a + * checksum. This is, however, just a convention used by various checksum layers + * and classes/traits that support writing checksum layers. + * + * @param prefix preferred namespace prefix for the namespace of the variables + * @param namespace namespace of the variables + * @param variables list of pairs, each is the name (an NCName, that is without any + * namespace prefix) and type of a variable + */ +final case class LayerVariables( + prefix: String, + namespace: String, + variables: java.util.List[(String, DFDLPrimType)], +) + +/** + * This is the primary API class for writing layers. + * + * All layers are derived from this class, and must have no-args default constructors. + * + * Derived classes will be dynamically loaded by Java's SPI system. + * The names of concrete classes derived from Layer are listed in a resources/M.services file + * so that they can be found and dynamically loaded. + * + * The SPI creates an instance the class of which is used as a factory to create the + * instances actually used by the Daffodil runtime. Compilation of the static information about + * the layer occurs only once and is then shared by all runtime instances. + * + * Instances of derived layer classes can be stateful. They are private to threads, and each time a layer + * is encountered during parse/unparse, an instance is created for that situation. + * + * All the static information about the layer is provided in the arguments. + * + * The rest of the Layer class implements the + * layer decode/encode logic, which is done as part of deriving one's Layer class from the + * Layer base class. + * + * About variables: Layer logic may read and write variables. Variables being read are parameters to + * the layer algorithm. Variables being written are outputs (such as checksums) from the layer algorithm. + * Variables being written must be undefined, since variables in DFDL are single-assignment. + * Variables being read must be defined before being read by the layer, and this is true for both + * parsing and unparsing. When unparsing, variables being read cannot be forward-referencing to parts + * of the DFDL infoset that have not yet been unparsed. + * + * @param layerName the name that will appear in the DFDL schema to identify the layer + * @param supportedLayerLengthKinds list of the layer length kinds the layer supports + * @param supportedLayerLengthUnits list of the layer length units the layer supports + * @param isRequiredLayerEncoding true if the layer is textual and so needs the layerEncoding property + * @param optLayerVariables a LayerVariables structure describing the variables the layer accesses + */ +abstract class Layer( + val layerName: String, + val supportedLayerLengthKinds: java.util.List[JLayerLengthKind], + val supportedLayerLengthUnits: java.util.List[JLayerLengthUnits], + val isRequiredLayerEncoding: Boolean, + val optLayerVariables: Optional[LayerVariables], +) { + + final def name(): String = + layerName // name() method with empty args is required by SPI loader + + /** + * Called exactly once when the schema is compiled to do extra checking that the layer is being used properly. + * The thrown exception becomes a SchemaDefinitionError at schema compile time. + * + * Example checks are: + * - layerEncoding is constant and is a single-byte charset + * - layerLength, if constant, is within a maximum value range + * - layerBoundaryMark string, if constant, is not too long and contains only allowed characters. + * These things can be required to be constant by this check, or it can check their values for legality + * if they happen to be constant. Since these are runtime-valued properties (can be expressions), then if the + * layer allowed that, they must also be checked at runtime. Review Comment: In thinking more about how things might work if limiting and transform layers were all the same thing, we need to consider how layerLength, layerBoundaryMark, etc. settings are provided to layers. I guess these would have to be variables? For example, a boundary mark might look like: ```xml <sequence> <annotation> <appinfo source="http://www.ogf.org/dfdl/"> <dfdl:newVariableInstance ref="bm:boundaryMark" defaultValue="\r\n" /> </appinfo> </annotation> <sequence dfdlx:layer="boundaryMark"> <sequence dfdlx:layer="lineFolded_AIS" > ... </sequence> </sequence> ... </sequence> ``` It's a bit more verbose, but it's much more general. I guess the idea is there are no other `dfdlx:layer*` properties, there is only `dfdlx:layer`. Layers can be either transform or limiting (or both), and all configuration of those layers must be done via variables. ########## daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/layers/api/Layer.scala: ########## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.daffodil.runtime1.layers.api + +import java.io.IOException +import java.io.InputStream +import java.io.OutputStream +import java.nio.charset.Charset +import java.util.Optional + +import org.apache.daffodil.runtime1.api.DFDLPrimType +import org.apache.daffodil.runtime1.layers.LayerException + +// TODO: Convert this whole file to Java ?? + +/** + * Descriptor of the DFDL variables used by the layer. + * + * The names and types must match the dfdl:defineVariable definitions used in + * an included/imported DFDL component schema which provides the layer definition for use + * by other schemas. + * + * The first such variable is sometimes distinguished as the variable written to + * as the single unique value of the layer, such as when the layer computes a + * checksum. This is, however, just a convention used by various checksum layers + * and classes/traits that support writing checksum layers. + * + * @param prefix preferred namespace prefix for the namespace of the variables + * @param namespace namespace of the variables + * @param variables list of pairs, each is the name (an NCName, that is without any + * namespace prefix) and type of a variable + */ +final case class LayerVariables( + prefix: String, + namespace: String, + variables: java.util.List[(String, DFDLPrimType)], +) + +/** + * This is the primary API class for writing layers. + * + * All layers are derived from this class, and must have no-args default constructors. + * + * Derived classes will be dynamically loaded by Java's SPI system. + * The names of concrete classes derived from Layer are listed in a resources/M.services file + * so that they can be found and dynamically loaded. + * + * The SPI creates an instance the class of which is used as a factory to create the + * instances actually used by the Daffodil runtime. Compilation of the static information about + * the layer occurs only once and is then shared by all runtime instances. + * + * Instances of derived layer classes can be stateful. They are private to threads, and each time a layer + * is encountered during parse/unparse, an instance is created for that situation. + * + * All the static information about the layer is provided in the arguments. + * + * The rest of the Layer class implements the + * layer decode/encode logic, which is done as part of deriving one's Layer class from the + * Layer base class. + * + * About variables: Layer logic may read and write variables. Variables being read are parameters to + * the layer algorithm. Variables being written are outputs (such as checksums) from the layer algorithm. + * Variables being written must be undefined, since variables in DFDL are single-assignment. + * Variables being read must be defined before being read by the layer, and this is true for both + * parsing and unparsing. When unparsing, variables being read cannot be forward-referencing to parts + * of the DFDL infoset that have not yet been unparsed. + * + * @param layerName the name that will appear in the DFDL schema to identify the layer + * @param supportedLayerLengthKinds list of the layer length kinds the layer supports + * @param supportedLayerLengthUnits list of the layer length units the layer supports + * @param isRequiredLayerEncoding true if the layer is textual and so needs the layerEncoding property + * @param optLayerVariables a LayerVariables structure describing the variables the layer accesses + */ +abstract class Layer( + val layerName: String, + val supportedLayerLengthKinds: java.util.List[JLayerLengthKind], + val supportedLayerLengthUnits: java.util.List[JLayerLengthUnits], + val isRequiredLayerEncoding: Boolean, + val optLayerVariables: Optional[LayerVariables], +) { + + final def name(): String = + layerName // name() method with empty args is required by SPI loader + + /** + * Called exactly once when the schema is compiled to do extra checking that the layer is being used properly. + * The thrown exception becomes a SchemaDefinitionError at schema compile time. + * + * Example checks are: + * - layerEncoding is constant and is a single-byte charset + * - layerLength, if constant, is within a maximum value range Review Comment: Another thought, can layerLengthKind="explicit" and layerLength be removed, and just use a parent element with dfdl:length For example, what is the difference between these two: ```xml <sequence dfdlx:layer="foo" dfdlx:layerLengthKind="explicit" dfdl:lengthLength="100"> ... </sequence> ``` ```xml <element name="foo" dfdl:lengthKind="explicit" dfdl:length="100"> <complexType> <sequence dfdlx:layer="foo"> ... </sequence> </complexType> </element> ``` The latter is more verbose, but I think it acheives the same thing? The idea being layers don't have support a specific kind of layer length. A layer just always reads until it hits the end of data, which might be EOF or might be a length provided by a parent element, or might be restricted by a surrounding layer that limits the length. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
