[ 
https://issues.apache.org/jira/browse/DAFFODIL-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331124#comment-17331124
 ] 

Mike Beckerle edited comment on DAFFODIL-2502 at 4/24/21, 1:40 AM:
-------------------------------------------------------------------

I think the bucketing input stream has to keep a flag indicating whether it has 
gotten an EOD on the read(buf, off, len) call, and at what position corresponds 
to when an EOD was detected.

Then isAtEnd should be roughly:
{code:java}
def isAtEnd = haveSeenEOD && positionIsAtEODSavedPosition{code}
It is possible we read right up to and including the very last byte of data, 
but do not read further so we don't actually get back the EOD -1. 

isAtEnd would be false in that case because we don't know we're at the end.

But we could, in principle, do one more read, outside of the parse() method. 
And that should NOT provide data, but get back the EOD. --

That is what we actually want to be satisfied there is no left-over data in the 
TDML runner and CLI.


was (Author: mbeckerle):
I think the bucketing input stream has to keep a flag indicating whether it has 
gotten an EOD on the read(buf, off, len) call, and at what position corresponds 
to when an EOD was detected.

Then isAtEnd should be roughly:
{code:java}
def isAtEnd = haveSeenEOD && positionIsAtEODSavedPosition{code}
 

> Parse must behave properly for reading data from TCP sockets
> ------------------------------------------------------------
>
>                 Key: DAFFODIL-2502
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2502
>             Project: Daffodil
>          Issue Type: Bug
>          Components: API, Back End
>    Affects Versions: 3.0.0
>            Reporter: Mike Beckerle
>            Assignee: Mike Beckerle
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Daffodil assumes the input streams are like files - reads are always blocking 
> for either 1 or more bytes of data, or End-of-data.
> People want to use Daffodil to read data from TCP/IP sockets. These can 
> return 0 bytes from a read because there is no data available, but that does 
> NOT mean the end of data. It's just a temporary condition. More data may come 
> along.
> Daffodil's InputSourceDataInputStream is wrapped around a regular Java input 
> stream, and enables us to support incoming messages which do not conform to 
> byte-boundaries.
> The problem is that there's no way for users to wrap an 
> InputSourceDataInputStream around a TCP/IP socket, and have it behave 
> properly when a read() call temporarily says 0 bytes available.
> Obviously we don't want to sit in a tight loop just retrying the read until 
> we get either some bytes or end-of-data.
> The right API here is that if the read() of the underlying java stream 
> returns 0 bytes, that a hook function supplied by the API user is called.
> One obvious thing a user can do is put a call to Thread.yield() in the hook. 
> (That might even want to be the default behavior if they supply no hook.) 
> Then if they have a separate thread parsing the data with daffodil, that 
> thread will at least yield the CPU, i.e., behave politely in a multi-threaded 
> world.
> More advanced usage could start a Daffodil parse using co-routines, returning 
> control to the caller when the parse must pause due to read() of the Java 
> input stream returning 0 bytes.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to