[GitHub] [daffodil] stevedlawrence commented on a change in pull request #539: Fix TCP/network input stream hangs on isAtEnd

GitBox Tue, 27 Apr 2021 05:22:41 -0700


stevedlawrence commented on a change in pull request #539:
URL: https://github.com/apache/daffodil/pull/539#discussion_r621136379




##########
File path: 
daffodil-io/src/main/scala/org/apache/daffodil/io/InputSourceDataInputStream.scala
##########
@@ -507,7 +540,10 @@ final class InputSourceDataInputStream private (val 
inputSource: InputSource)
   /**
    * Determines whether the input stream has this much more data.
    *
-   * Does not advance the position
+   * Does not advance the position.
+   *
+   * This operation will block until either n bytes are read or end-of-data

Review comment:
       Technically it's n **bits** are read, but we do convert to bytes under 
the hood since input streams are byte base. We might also want to say 
"available" rather than read since it might not need to read if the bytes have 
already been cached? Maybe that's all too in the details though?

##########
File path: 
daffodil-japi/src/main/java/org/apache/daffodil/japi/package-info.java
##########
@@ -151,11 +151,11 @@
  * InputSourceDataInputStream is = new InputSourceDataInputStream(dataStream);
  * JDOMInfosetOutputter jdomOutputter = new JDOMInfosetOutputter();
  * boolean keepParsing = true;
- * while (keepParsing) {
+ * while (keepParsing && is.isDefinedForLength(1)) {

Review comment:
       Thoughts on onsidering a differenent public API function that gives a 
little less power to the user? This functions let's the user prefetch anymount 
they want, e.g. `is.isDefinedForLength(500000)``. Which means we'll block until 
500000 bits are cached. Any concerns with letting a user do that?
   
   An alternative is maybe something like ``is.hasData()`` which could 
essentially be a public alias for ``is.isDefinedForLength(1)``, but limits what 
a user could do. We could always make isDefinedForLength public in the future 
if users did need it, but it's harder to take it away wihtout going through a 
deprecation process.

##########
File path: 
daffodil-japi/src/test/java/org/apache/daffodil/example/TestJavaAPI.java
##########
@@ -863,7 +849,6 @@ public void testJavaAPI17() throws IOException, 
InvalidUsageException, ClassNotF
         assertTrue(res.isError());
         assertFalse(res.isProcessingError());
         assertTrue(res.isValidationError());
-        assertTrue(res.location().isAtEnd());

Review comment:
       Is it worth adding ``res.location().bitPos0b == lengthOfData`` 
comparison? We sometimes referene this file for examples of Java API usage, so 
showing how to test that we consumed all the data might be useful.

##########
File path: 
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataLoc.scala
##########
@@ -33,9 +33,20 @@ import org.apache.daffodil.util.MaybeULong
 import org.apache.daffodil.api.DataLocation
 import org.apache.daffodil.processors.parsers.PState
 
-class DataLoc(val bitPos1b: Long, bitLimit1b: MaybeULong, val isAtEnd: 
Boolean, eitherStream: Either[DataOutputStream, DataInputStream],
+class DataLoc(
+  val bitPos1b: Long,
+  bitLimit1b: MaybeULong,
+  eitherStream: Either[DataOutputStream, DataInputStream],
   val maybeERD: Maybe[ElementRuntimeData]) extends DataLocation {
 
+  @deprecated("Use bitPos1b to compare with expected position (possibly 
bitLimit1b).", "3.1.0")
+  override def isAtEnd =
+    eitherStream match {
+      case Right(isdis: InputSourceDataInputStream) => isdis.isAtEnd

Review comment:
       I"m not sure this is worth worrying about, but this potentially has 
different behavior if you do something like this:
   ```scala
   val res1 = dp.parse(isdis, ...)
   val loc1 = res.location()
   val res2 = dp.parse(isdis, ...)
   val loc2 = res.location()
   println(loc1.isAtEnd)
   println(loc2.isAtEnd)
   ```
   So if our InputSourceDataInputStream hit an EOD at the end of the second 
parse (i.e. our stream had two messages), then both ``loc1.isAtEnd`` and 
``loc2.isAtEnd`` would be true. But only loc2.isAtEnd should be true. So future 
parse calls can change the isAtEnd state of previous ParseResults. Maybe this 
isn't a big deal since I doubt anyone will check isAtEnd of a previous parse 
location, but maybe it's worth considering? We at least probably want a 
backwards compatability note somewhere about this if we do keep this.
   
   Could maybe do some logic where DataLoc also captures the bitPos of the 
InputSourceDataInputStream when it's created. If the data location has changed, 
then isAtEnd returns false, otherwise it returns isids.isAtEnd? Seems 
complicated for such a small edge case.
   
   Another alternative would maybe be to have a tunable that complete disables 
use of isAtEnd in the DataLoc. Depending on the value of that tunable, DataLoc 
either calls isdis.isAtEnd and stores it (and blocking on TCP connections), or 
it just always returns false. Maybe call it something like 
"requireAccurateIsAtEnd" or something. People using TCP that aren't going to 
use isAtEnd can set this to false. And we default it to true for perfect 
backwards compatibility. Again, might be extra complication for a small edge 
case, so not sure it's worth it.
   
   Just want to through out some ideas an see if it sparks any thoughts.
   
   Agree that we definitely need to deprecate isAtEnd and its use though.

##########
File path: 
daffodil-japi/src/main/scala/org/apache/daffodil/japi/io/InputSourceDataInputStream.scala
##########
@@ -43,5 +43,16 @@ class InputSourceDataInputStream private[japi] (private 
[japi] val dis: SInputSo
   /**
    * Create an InputSourceDataInputStream from a byte array
    */
-  def this(arr: Array[Byte]) = this(SInputSourceDataInputStream(arr)) 
+  def this(arr: Array[Byte]) = this(SInputSourceDataInputStream(arr))
+
+  /**
+   * Used to see if there is more data or not after a parse operation.
+   * Whether more data is a left-over-data error, or just more data to
+   * parse in a subsequent parse call depends on the intention of the
+   * code calling the parse operation.
+   *
+   * This operation will block until either n bytes are read or end-of-data

Review comment:
       See other comment. Should be bits and maybe read vs available?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [daffodil] stevedlawrence commented on a change in pull request #539: Fix TCP/network input stream hangs on isAtEnd

Reply via email to