stevedlawrence commented on a change in pull request #539:
URL: https://github.com/apache/daffodil/pull/539#discussion_r621136379
##########
File path:
daffodil-io/src/main/scala/org/apache/daffodil/io/InputSourceDataInputStream.scala
##########
@@ -507,7 +540,10 @@ final class InputSourceDataInputStream private (val
inputSource: InputSource)
/**
* Determines whether the input stream has this much more data.
*
- * Does not advance the position
+ * Does not advance the position.
+ *
+ * This operation will block until either n bytes are read or end-of-data
Review comment:
Technically it's n **bits** are read, but we do convert to bytes under
the hood since input streams are byte base. We might also want to say
"available" rather than read since it might not need to read if the bytes have
already been cached? Maybe that's all too in the details though?
##########
File path:
daffodil-japi/src/main/java/org/apache/daffodil/japi/package-info.java
##########
@@ -151,11 +151,11 @@
* InputSourceDataInputStream is = new InputSourceDataInputStream(dataStream);
* JDOMInfosetOutputter jdomOutputter = new JDOMInfosetOutputter();
* boolean keepParsing = true;
- * while (keepParsing) {
+ * while (keepParsing && is.isDefinedForLength(1)) {
Review comment:
Thoughts on onsidering a differenent public API function that gives a
little less power to the user? This functions let's the user prefetch anymount
they want, e.g. `is.isDefinedForLength(500000)``. Which means we'll block until
500000 bits are cached. Any concerns with letting a user do that?
An alternative is maybe something like ``is.hasData()`` which could
essentially be a public alias for ``is.isDefinedForLength(1)``, but limits what
a user could do. We could always make isDefinedForLength public in the future
if users did need it, but it's harder to take it away wihtout going through a
deprecation process.
##########
File path:
daffodil-japi/src/test/java/org/apache/daffodil/example/TestJavaAPI.java
##########
@@ -863,7 +849,6 @@ public void testJavaAPI17() throws IOException,
InvalidUsageException, ClassNotF
assertTrue(res.isError());
assertFalse(res.isProcessingError());
assertTrue(res.isValidationError());
- assertTrue(res.location().isAtEnd());
Review comment:
Is it worth adding ``res.location().bitPos0b == lengthOfData``
comparison? We sometimes referene this file for examples of Java API usage, so
showing how to test that we consumed all the data might be useful.
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataLoc.scala
##########
@@ -33,9 +33,20 @@ import org.apache.daffodil.util.MaybeULong
import org.apache.daffodil.api.DataLocation
import org.apache.daffodil.processors.parsers.PState
-class DataLoc(val bitPos1b: Long, bitLimit1b: MaybeULong, val isAtEnd:
Boolean, eitherStream: Either[DataOutputStream, DataInputStream],
+class DataLoc(
+ val bitPos1b: Long,
+ bitLimit1b: MaybeULong,
+ eitherStream: Either[DataOutputStream, DataInputStream],
val maybeERD: Maybe[ElementRuntimeData]) extends DataLocation {
+ @deprecated("Use bitPos1b to compare with expected position (possibly
bitLimit1b).", "3.1.0")
+ override def isAtEnd =
+ eitherStream match {
+ case Right(isdis: InputSourceDataInputStream) => isdis.isAtEnd
Review comment:
I"m not sure this is worth worrying about, but this potentially has
different behavior if you do something like this:
```scala
val res1 = dp.parse(isdis, ...)
val loc1 = res.location()
val res2 = dp.parse(isdis, ...)
val loc2 = res.location()
println(loc1.isAtEnd)
println(loc2.isAtEnd)
```
So if our InputSourceDataInputStream hit an EOD at the end of the second
parse (i.e. our stream had two messages), then both ``loc1.isAtEnd`` and
``loc2.isAtEnd`` would be true. But only loc2.isAtEnd should be true. So future
parse calls can change the isAtEnd state of previous ParseResults. Maybe this
isn't a big deal since I doubt anyone will check isAtEnd of a previous parse
location, but maybe it's worth considering? We at least probably want a
backwards compatability note somewhere about this if we do keep this.
Could maybe do some logic where DataLoc also captures the bitPos of the
InputSourceDataInputStream when it's created. If the data location has changed,
then isAtEnd returns false, otherwise it returns isids.isAtEnd? Seems
complicated for such a small edge case.
Another alternative would maybe be to have a tunable that complete disables
use of isAtEnd in the DataLoc. Depending on the value of that tunable, DataLoc
either calls isdis.isAtEnd and stores it (and blocking on TCP connections), or
it just always returns false. Maybe call it something like
"requireAccurateIsAtEnd" or something. People using TCP that aren't going to
use isAtEnd can set this to false. And we default it to true for perfect
backwards compatibility. Again, might be extra complication for a small edge
case, so not sure it's worth it.
Just want to through out some ideas an see if it sparks any thoughts.
Agree that we definitely need to deprecate isAtEnd and its use though.
##########
File path:
daffodil-japi/src/main/scala/org/apache/daffodil/japi/io/InputSourceDataInputStream.scala
##########
@@ -43,5 +43,16 @@ class InputSourceDataInputStream private[japi] (private
[japi] val dis: SInputSo
/**
* Create an InputSourceDataInputStream from a byte array
*/
- def this(arr: Array[Byte]) = this(SInputSourceDataInputStream(arr))
+ def this(arr: Array[Byte]) = this(SInputSourceDataInputStream(arr))
+
+ /**
+ * Used to see if there is more data or not after a parse operation.
+ * Whether more data is a left-over-data error, or just more data to
+ * parse in a subsequent parse call depends on the intention of the
+ * code calling the parse operation.
+ *
+ * This operation will block until either n bytes are read or end-of-data
Review comment:
See other comment. Should be bits and maybe read vs available?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]