[GitHub] [daffodil] mbeckerle commented on a change in pull request #539: Fix TCP/network input stream hangs on isAtEnd

GitBox Mon, 26 Apr 2021 07:56:29 -0700


mbeckerle commented on a change in pull request #539:
URL: https://github.com/apache/daffodil/pull/539#discussion_r620376115




##########
File path: 
daffodil-io/src/main/scala/org/apache/daffodil/io/InputSourceDataInputStream.scala
##########
@@ -117,6 +118,57 @@ final class InputSourceDataInputStream private (val 
inputSource: InputSource)
   @inline
   override final def bitLimit0b: MaybeULong = cst.bitLimit0b
 
+  /**
+   * Tells us if the underlying input source has detected end-of-data
+   * (the read(...) call returned -1.
+   *
+   * But this does NOT tell us we are positioned at the end, only whether
+   * in the course of reading, we encountered the end of data. If we
+   * backtracked we could have seen the end of data, but backed up in
+   * the data to an earlier position.
+   */
+  def hasReachedEndOfData: Boolean = inputSource.hasReachedEndOfData
+
+  /**
+   * Determine if we're positioned at the end of data without
+   * doing any additional blocking operation such as reading more
+   * data to test if there is any.
+   *
+   * This depends on the underlying inputSource keeping track of
+   * whether it has previously hit the end of data or not.
+   *
+   * @return
+   */
+  final def isAtEnd(): Boolean = {
+    if (bitLimit0b.isDefined) {
+      bitPos0b == bitLimit0b.get

Review comment:
       Ah, yes it appears we have a stack discipline issue here. We narrow the 
available data by assigning a bit limit whenever we have a specified length. 
   
   I just searched for uses of isAtEnd.
   
   I'm quite convinced we should remove isAtEnd from the API. We only use it in 
tests. We know it is incorrect for it to be reading more data. You point out 
here that because it records the -1/End-of-Data in state of the ISDIS, we can't 
call it on an inner narrowed specified length scope or we'll be getting isAtEnd 
. 
   
   Algorithmically, the right thing is for applications to call 
ISDIS.areBytesAvailable *before* a parse, and not expect upon return from the 
parse for it to be known whether we are at the end of the data stream or not. 
   This code thunk we use to document streaming is wrong:
   ```
   InputSourceDataInputStream is = new InputSourceDataInputStream(dataStream);
   JDOMInfosetOutputter jdomOutputter = new JDOMInfosetOutputter();
   boolean keepParsing = true;
   while (keepParsing) {
     jdomOutputter.reset();
     ParseResult pr = dp.parse(is, jdomOutputter);
     ...
     keepParsing = !pr.location().isAtEnd() && !pr.isError();
   }
   ```
   
   It should be:
   
   ```
   InputSourceDataInputStream is = new InputSourceDataInputStream(dataStream);
   JDOMInfosetOutputter jdomOutputter = new JDOMInfosetOutputter();
   var keepParsing = true
   while (keepParsing && is.isDefinedForLength(1)) { // there is at least 1 bit 
to parse!
     jdomOutputter.reset();
     ParseResult pr = dp.parse(is, jdomOutputter);
     ...
     keepParsing = !pr.isError();
   }
   ```
   where we test for data availability at the top, before the call to parse, 
and there is no checking of isAtEnd at all. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [daffodil] mbeckerle commented on a change in pull request #539: Fix TCP/network input stream hangs on isAtEnd

Reply via email to