stevedlawrence commented on code in PR #1191: URL: https://github.com/apache/daffodil/pull/1191#discussion_r1547931916
########## daffodil-runtime1-layers/src/main/scala/org/apache/daffodil/layers/runtime1/FixedLengthLayer.scala: ########## @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.daffodil.layers.runtime1 + +import java.io.ByteArrayInputStream +import java.io.ByteArrayOutputStream +import java.io.InputStream +import java.io.OutputStream +import java.nio.ByteBuffer + +import org.apache.daffodil.lib.exceptions.Assert +import org.apache.daffodil.runtime1.layers.LayerNotEnoughDataException +import org.apache.daffodil.runtime1.layers.api.Layer + +import org.apache.commons.io.IOUtils + +/** + * Suitable only for small sections of data, not large data streams or whole files. + * See the maxFixedLength value defined herein for the maximum. + * + * The entire fixed length region of the data will be pulled into a byte buffer in memory. + * + * TODO: Someday, enhance to make this streaming. + * + * One DFDL Variable is a parameter + * - fixedLength - an unsignedInt giving the fixed length of this layer. + * This length is enforced on both parsing and unparsing the layer. + * There are no output/result DFDL variables from this layer. + */ +final class FixedLengthLayer + extends Layer("fixedLength", "urn:org.apache.daffodil.layers.fixedLength") { + + private var fixedLength: Int = -1 + + private def maxFixedLength = Short.MaxValue + + /** + * Captures the fixedLength DFDL variable value and saves it. + * + * Also validates whether it is in acceptable range. Because the fixedLength variable + * can be populated from an expression that consumes input data, it's possible for the + * fixedLength to be out of range as a result of speculative parsing down + * an incorrect path. Hence, it is a processing error, not a schema definition error + * if the fixedLength variable value is out of range. + * + * @param fixedLength matches the name of the DFDL variable which will supply this value. + */ + private[layers] def setLayerVariableParameters(fixedLength: Long): Unit = { + this.fixedLength = fixedLength.toInt + Assert.invariant(fixedLength >= 0) // variable is unsignedInt, so this can't be negative + if (fixedLength > maxFixedLength) + processingError( + s"fixedLength value of $fixedLength is above the maximum of $maxFixedLength.", + ) + } + + override def wrapLayerInput(jis: InputStream): InputStream = { + new FixedLengthInputStream(jis) Review Comment: Note, I just came across Apache CommonsIO `BoundedInputStream` which does basically the same thing as this but streams data. So it fixes the "make this streaming" TODO, at least for the input part. It think it would mean the `nRead < fixedLength` would need to be done at close though, maybe something like this: ```scala new BoundedInputStream(jis, fixedLength) { override def close(): Unit = { if (getCount() < fixedLength) { processingError(new LayerNotEnoughDataException(fixedLength, getCount)) } super.close() } } ``` Note that this changes the PE to occur at the end of the layer instead of at the beginning. Could that affect backtracking? Which raises a question to me, if a layer throws a PE how does backtracking work? Does it backtrack to the very beginning of the layer? Or is the PE just handled by whatever is currently parsing the layer data? Which means a parser could potentially ignore a layer PE by just backtracking a little and keep reading on? It feels like if a layer creates a PE, we want to backtrack all the way back to before that layer started, since that implies the data wasn't valid layer data. Is how it works? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
