mbeckerle commented on code in PR #1191: URL: https://github.com/apache/daffodil/pull/1191#discussion_r1582441056
########## daffodil-runtime1-layers/src/main/scala/org/apache/daffodil/layers/runtime1/FixedLengthLayer.scala: ########## @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.daffodil.layers.runtime1 + +import java.io.ByteArrayInputStream +import java.io.ByteArrayOutputStream +import java.io.InputStream +import java.io.OutputStream +import java.nio.ByteBuffer + +import org.apache.daffodil.lib.exceptions.Assert +import org.apache.daffodil.runtime1.layers.LayerNotEnoughDataException +import org.apache.daffodil.runtime1.layers.api.Layer + +import org.apache.commons.io.IOUtils + +/** + * Suitable only for small sections of data, not large data streams or whole files. + * See the maxFixedLength value defined herein for the maximum. + * + * The entire fixed length region of the data will be pulled into a byte buffer in memory. + * + * TODO: Someday, enhance to make this streaming. + * + * One DFDL Variable is a parameter + * - fixedLength - an unsignedInt giving the fixed length of this layer. + * This length is enforced on both parsing and unparsing the layer. + * There are no output/result DFDL variables from this layer. + */ +final class FixedLengthLayer + extends Layer("fixedLength", "urn:org.apache.daffodil.layers.fixedLength") { + + private var fixedLength: Int = -1 + + private def maxFixedLength = Short.MaxValue + + /** + * Captures the fixedLength DFDL variable value and saves it. + * + * Also validates whether it is in acceptable range. Because the fixedLength variable + * can be populated from an expression that consumes input data, it's possible for the + * fixedLength to be out of range as a result of speculative parsing down + * an incorrect path. Hence, it is a processing error, not a schema definition error + * if the fixedLength variable value is out of range. + * + * @param fixedLength matches the name of the DFDL variable which will supply this value. + */ + private[layers] def setLayerVariableParameters(fixedLength: Long): Unit = { + this.fixedLength = fixedLength.toInt + Assert.invariant(fixedLength >= 0) // variable is unsignedInt, so this can't be negative + if (fixedLength > maxFixedLength) + processingError( + s"fixedLength value of $fixedLength is above the maximum of $maxFixedLength.", + ) + } + + override def wrapLayerInput(jis: InputStream): InputStream = { + new FixedLengthInputStream(jis) Review Comment: I decided to leave this as is with a TODO that says it could be made streaming but not to change it until/unless there is demand for the feature. Basically too much new testing and debug is required that I don't have time for. As for what is the backtrack behavior, keep in mind that the start of the layer is quite possibly NOT a point of uncertainty. There's no notion of going back to the start of the layer separate from if there happens to be a point of uncertainty within the layer right at the start, then backtracking can reset the position of the PState to 0 and that will backup in the layer to position 0 of those bytes. But this is exactly the same thing as regular non-layer data streams. Backtracking inside a layer is no different from backtracking inside a regular data stream that is associated with finite input such as a file. The layer being too short causing an error is the same as a file being too short. The layer throwing some random exception because the data appears to be corrupted is the same as for example trying to parse a text number and hitting non-digit characters. The data can always look like noise because the parser is speculating down the wrong branch, and that's as true for laye r decoders as it is for number parsers. If there is no point of uncertainty until somewhere *before* the layer start, then backtracking can reset to before the layer even existed, keep in mind the layer being used at all is part of what is being reset by backtracking, so backtracking to before the layer removes the layer, discarding the objects (which are reclaimed by garbage collection eventually). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
