stevedlawrence commented on a change in pull request #539:
URL: https://github.com/apache/daffodil/pull/539#discussion_r620276245
##########
File path: daffodil-io/src/main/scala/org/apache/daffodil/io/InputSource.scala
##########
@@ -255,18 +292,27 @@ class BucketingInputSource(
*/
private def fillBucketsToIndex(goalBucketIndex: Long, bytesNeededInBucket:
Long): Boolean = {
var lastBucketIndex = buckets.length - 1
- var hasMoreData = true
+
var needsMoreData = goalBucketIndex > lastBucketIndex || (goalBucketIndex
== lastBucketIndex && bytesNeededInBucket > bytesFilledInLastBucket)
while (needsMoreData && hasMoreData) {
// Try to fill the rest of this bucket, regardless of how many bytes are
// actually needed
val emptyBytesInLastBucket = bucketSize - bytesFilledInLastBucket
+ Assert.invariant(emptyBytesInLastBucket >= 0)
Review comment:
Are there any errors if we change this assert to > instead of >=? I
don't think we should ever be calling read with a len of zero bytes.
##########
File path: daffodil-io/src/main/scala/org/apache/daffodil/io/InputSource.scala
##########
@@ -255,18 +292,27 @@ class BucketingInputSource(
*/
private def fillBucketsToIndex(goalBucketIndex: Long, bytesNeededInBucket:
Long): Boolean = {
var lastBucketIndex = buckets.length - 1
- var hasMoreData = true
+
var needsMoreData = goalBucketIndex > lastBucketIndex || (goalBucketIndex
== lastBucketIndex && bytesNeededInBucket > bytesFilledInLastBucket)
while (needsMoreData && hasMoreData) {
// Try to fill the rest of this bucket, regardless of how many bytes are
// actually needed
val emptyBytesInLastBucket = bucketSize - bytesFilledInLastBucket
+ Assert.invariant(emptyBytesInLastBucket >= 0)
// Try to read enough bytes to fill the rest of this bucket. Note that
// the .read() function could hit EOF (returns -1) or could return
- // anywhere from zero to emptyBytesInLastBucket bytes
- val bytesRead = inputStream.read(buckets(lastBucketIndex).bytes,
bytesFilledInLastBucket, emptyBytesInLastBucket)
+ // anywhere from 0 to emptyBytesInLastBucket bytes
+ val bytesRead =
+ inputStream.read(
+ buckets(lastBucketIndex).bytes,
+ bytesFilledInLastBucket,
+ emptyBytesInLastBucket)
+
+ // check for bad inputStream behavior. It's not our fault!
+ if (bytesRead == 0 && emptyBytesInLastBucket > 0)
Review comment:
If we can change the above assert to > 0, this can just become
``bytesRead == 0``
##########
File path:
daffodil-io/src/main/scala/org/apache/daffodil/io/InputSourceDataInputStream.scala
##########
@@ -117,6 +118,57 @@ final class InputSourceDataInputStream private (val
inputSource: InputSource)
@inline
override final def bitLimit0b: MaybeULong = cst.bitLimit0b
+ /**
+ * Tells us if the underlying input source has detected end-of-data
+ * (the read(...) call returned -1.
+ *
+ * But this does NOT tell us we are positioned at the end, only whether
+ * in the course of reading, we encountered the end of data. If we
+ * backtracked we could have seen the end of data, but backed up in
+ * the data to an earlier position.
+ */
+ def hasReachedEndOfData: Boolean = inputSource.hasReachedEndOfData
+
+ /**
+ * Determine if we're positioned at the end of data without
+ * doing any additional blocking operation such as reading more
+ * data to test if there is any.
+ *
+ * This depends on the underlying inputSource keeping track of
+ * whether it has previously hit the end of data or not.
+ *
+ * @return
+ */
+ final def isAtEnd(): Boolean = {
+ if (bitLimit0b.isDefined) {
+ bitPos0b == bitLimit0b.get
+ } else {
+ // There is no bit limit, so whether we're at the end
+ // requires that we've hit then end-of-data on a read
+ // operation, AND that we've not backtracked since then,
+ // meaning we're positioned at the end of the last byte
+ // read.
+ hasReachedEndOfData && {
+ //
+ // No real input source can end in the middle of a byte.
+ // The only way that can happen is with the bitLimit having been
+ // set, and we've already dealt with that.
+ //
+ // So we just have to know that we're positioned at the
+ // end of the data.
+ //
+ val inputStreamBytePos0b = inputSource.position()
+ val isAtFinalByte = bytePos0b == inputStreamBytePos0b
Review comment:
I thought bytePos0b and inputSource.position() were basically always
kept in sync? When we change bytePos0b, don't we also change the InputSource
byte position, even when we backtrack. Seems like this is always going to be
true, even if we've backtracked. Does the InputSource need a isAtEndOfData
since only it really knows whre the last byte in the InputSource is?
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/layers/LayerTransformer.scala
##########
@@ -216,7 +216,7 @@ class JavaIOInputStream(s: InputSourceDataInputStream,
finfo: FormatInfo)
}
}
- override def available(): Int = 1
+ override def available(): Int = 0
Review comment:
Does nothing rely on this value? Surprised this didn't cause anything to
fail.
##########
File path: daffodil-io/src/main/scala/org/apache/daffodil/io/InputSource.scala
##########
@@ -255,18 +292,27 @@ class BucketingInputSource(
*/
private def fillBucketsToIndex(goalBucketIndex: Long, bytesNeededInBucket:
Long): Boolean = {
var lastBucketIndex = buckets.length - 1
- var hasMoreData = true
+
var needsMoreData = goalBucketIndex > lastBucketIndex || (goalBucketIndex
== lastBucketIndex && bytesNeededInBucket > bytesFilledInLastBucket)
while (needsMoreData && hasMoreData) {
// Try to fill the rest of this bucket, regardless of how many bytes are
// actually needed
val emptyBytesInLastBucket = bucketSize - bytesFilledInLastBucket
+ Assert.invariant(emptyBytesInLastBucket >= 0)
// Try to read enough bytes to fill the rest of this bucket. Note that
// the .read() function could hit EOF (returns -1) or could return
- // anywhere from zero to emptyBytesInLastBucket bytes
- val bytesRead = inputStream.read(buckets(lastBucketIndex).bytes,
bytesFilledInLastBucket, emptyBytesInLastBucket)
+ // anywhere from 0 to emptyBytesInLastBucket bytes
Review comment:
Should we update this comment to say 1 to emptyBytesInLastBucket byte?
``read()`` should never actually return 0.
##########
File path: daffodil-japi/src/main/scala/org/apache/daffodil/japi/Daffodil.scala
##########
@@ -490,7 +490,11 @@ class DataLocation private[japi] (dl: SDataLocation) {
override def toString() = dl.toString
/**
- * Determine if this data location is at the end of the input data
+ * Determine if this data location is known to be at the end of the data
stream.
+ *
+ * Note that this can be false, yet all bits consumed. This occurs if no
+ * subsequent input read operation from the data occurred causing the
+ * input stream to detect end-of-data/EOF.
Review comment:
This is an API change. I can imagine cases where people relied on this
behavior for checking left over data, and I feel like it's very easy for binary
formats to never have this set to true. Prevously, unless you were using a
streaming thing, we always knew for sure if we hit EOD.
Maybe we need a tunable to switch between the "read one byte to figured out
EOD" vs "must have hit EOD before, and we aren't actually sure if we are at EOD
yet"? We can default to the old behavior, but people streaming or doing TCP
stuff can switch to the new behavior, with the understanding that isAtEnd might
not be accurate?
##########
File path:
daffodil-io/src/main/scala/org/apache/daffodil/io/InputSourceDataInputStream.scala
##########
@@ -117,6 +118,57 @@ final class InputSourceDataInputStream private (val
inputSource: InputSource)
@inline
override final def bitLimit0b: MaybeULong = cst.bitLimit0b
+ /**
+ * Tells us if the underlying input source has detected end-of-data
+ * (the read(...) call returned -1.
+ *
+ * But this does NOT tell us we are positioned at the end, only whether
+ * in the course of reading, we encountered the end of data. If we
+ * backtracked we could have seen the end of data, but backed up in
+ * the data to an earlier position.
+ */
+ def hasReachedEndOfData: Boolean = inputSource.hasReachedEndOfData
+
+ /**
+ * Determine if we're positioned at the end of data without
+ * doing any additional blocking operation such as reading more
+ * data to test if there is any.
+ *
+ * This depends on the underlying inputSource keeping track of
+ * whether it has previously hit the end of data or not.
+ *
+ * @return
+ */
+ final def isAtEnd(): Boolean = {
+ if (bitLimit0b.isDefined) {
+ bitPos0b == bitLimit0b.get
Review comment:
I'm not sure it's safe to use bitLimit here? I believe we change
``bitLimit0b`` when complex types have specified lengths to ensure children do
not parse beyond those specified lengths. If we have reached the end of the
complex type specified length, we shouldn't be considered isAtEnd, but I think
this will do that?
##########
File path:
daffodil-io/src/main/scala/org/apache/daffodil/io/InputSourceDataInputStream.scala
##########
@@ -117,6 +118,57 @@ final class InputSourceDataInputStream private (val
inputSource: InputSource)
@inline
override final def bitLimit0b: MaybeULong = cst.bitLimit0b
+ /**
+ * Tells us if the underlying input source has detected end-of-data
+ * (the read(...) call returned -1.
+ *
+ * But this does NOT tell us we are positioned at the end, only whether
+ * in the course of reading, we encountered the end of data. If we
+ * backtracked we could have seen the end of data, but backed up in
+ * the data to an earlier position.
+ */
+ def hasReachedEndOfData: Boolean = inputSource.hasReachedEndOfData
+
+ /**
+ * Determine if we're positioned at the end of data without
+ * doing any additional blocking operation such as reading more
+ * data to test if there is any.
+ *
+ * This depends on the underlying inputSource keeping track of
+ * whether it has previously hit the end of data or not.
+ *
+ * @return
Review comment:
Missing scaladoc for ``@return``
##########
File path:
daffodil-japi/src/main/scala/org/apache/daffodil/japi/io/InputSourceDataInputStream.scala
##########
@@ -43,5 +43,13 @@ class InputSourceDataInputStream private[japi] (private
[japi] val dis: SInputSo
/**
* Create an InputSourceDataInputStream from a byte array
*/
- def this(arr: Array[Byte]) = this(SInputSourceDataInputStream(arr))
+ def this(arr: Array[Byte]) = this(SInputSourceDataInputStream(arr))
+
+ /**
+ * Used to see if there is more data or not after a parse operation.
+ * Whether more data is a left-over-data error, or just more data to
+ * parse in a subsequent parse call depends on the intention of the
+ * code calling the parse operation.
+ */
Review comment:
I think it's worth adding a comment saying that this will block until
either at least either n bytes are read or EOD is hit.
##########
File path:
daffodil-tdml-processor/src/main/scala/org/apache/daffodil/tdml/processor/DaffodilTDMLDFDLProcessor.scala
##########
@@ -298,6 +298,13 @@ class DaffodilTDMLDFDLProcessor private (private var dp:
DataProcessor) extends
def doParseWithBothApis(dpInputStream: java.io.InputStream, saxInputStream:
java.io.InputStream,
lengthLimitInBits: Long): TDMLParseResult = {
+ //
+ // TDML Tests MUST have a length limit. Otherwise they cannot determine if
+ // there is left-over-data or not without doing more reading from the
input stream
+ // so as to be sure to hit end-of-data.
+ //
+ Assert.usage(lengthLimitInBits >= 0)
Review comment:
Can't TDML tests use the isDefinedForLength(1) thing? Then we don't
necessarily have to rely on lengthLimit?
##########
File path:
daffodil-io/src/test/scala/org/apache/daffodil/layers/TestBase64.scala
##########
@@ -66,7 +66,7 @@
W1vdXMgcXVvdGVzIG9yIHNvbmcgbHlyaWNzIG9yIGFueXRoaW5nIGxpa2UgdGhhdCBpbnRyb2R1Y
pairs.foreach {
case (exp, act) => {
if (exp != act) {
- println("differ at character %s (0-based). Expected '%s' got
'%s'.".format(i, exp, act))
+ // println("differ at character %s (0-based). Expected '%s' got
'%s'.".format(i, exp, act))
Review comment:
Should we just delete these printlns?
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/PState.scala
##########
@@ -288,7 +285,8 @@ final class PState private (
}
def currentLocation: DataLocation = {
- val isAtEnd = !dataInputStream.isDefinedForLength(1)
+ // val isAtEnd = !dataInputStream.isDefinedForLength(1)
+ val isAtEnd = dataInputStream.isAtEnd()
Review comment:
Agreed, but need to consider backwards compatibility.
##########
File path:
daffodil-test/src/test/resources/org/apache/daffodil/section12/lengthKind/ExplicitTests.tdml
##########
@@ -598,7 +598,7 @@
<tdml:documentPart type="text"><![CDATA[000118Ridgewood Circle
Rochester NY123]]></tdml:documentPart>
</tdml:document>
<tdml:errors>
- <tdml:error>6467715096</tdml:error>
+ <tdml:error>6467715464</tdml:error>
Review comment:
Any idea why this number changed?
##########
File path:
daffodil-io/src/main/scala/org/apache/daffodil/io/InputSourceDataInputStream.scala
##########
@@ -117,6 +118,57 @@ final class InputSourceDataInputStream private (val
inputSource: InputSource)
@inline
override final def bitLimit0b: MaybeULong = cst.bitLimit0b
+ /**
+ * Tells us if the underlying input source has detected end-of-data
+ * (the read(...) call returned -1.
+ *
+ * But this does NOT tell us we are positioned at the end, only whether
+ * in the course of reading, we encountered the end of data. If we
+ * backtracked we could have seen the end of data, but backed up in
+ * the data to an earlier position.
+ */
+ def hasReachedEndOfData: Boolean = inputSource.hasReachedEndOfData
+
+ /**
+ * Determine if we're positioned at the end of data without
+ * doing any additional blocking operation such as reading more
+ * data to test if there is any.
+ *
+ * This depends on the underlying inputSource keeping track of
+ * whether it has previously hit the end of data or not.
+ *
+ * @return
+ */
+ final def isAtEnd(): Boolean = {
+ if (bitLimit0b.isDefined) {
+ bitPos0b == bitLimit0b.get
+ } else {
+ // There is no bit limit, so whether we're at the end
+ // requires that we've hit then end-of-data on a read
+ // operation, AND that we've not backtracked since then,
+ // meaning we're positioned at the end of the last byte
+ // read.
+ hasReachedEndOfData && {
Review comment:
I think this is a now a non-backwards compatible change now. isAtEnd can
only be true if we've gotten an EOD. And I imagine in a lot of binary formats
we'll never try that extra read to get the EOD because we don't ever need to
scan.
##########
File path:
daffodil-sapi/src/main/scala/org/apache/daffodil/sapi/io/InputSourceDataInputStream.scala
##########
@@ -43,5 +43,14 @@ class InputSourceDataInputStream private[sapi] (private
[sapi] val dis: SInputSo
/**
* Create an InputSourceDataInputStream from a byte array
*/
- def this(arr: Array[Byte]) = this(SInputSourceDataInputStream(arr))
+ def this(arr: Array[Byte]) = this(SInputSourceDataInputStream(arr))
+
+
+ /**
+ * Used to see if there is more data or not after a parse operation.
+ * Whether more data is a left-over-data error, or just more data to
+ * parse in a subsequent parse call depends on the intention of the
+ * code calling the parse operation.
+ */
Review comment:
Same comment here as in the JAPI, might want to mention the blocking
aspect of this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]