[
https://issues.apache.org/jira/browse/DAFFODIL-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331038#comment-17331038
]
Steve Lawrence commented on DAFFODIL-2502:
------------------------------------------
Yep, that won't allow finishing a parse until we start getting the next message
or hit an EOF. Definitely feels like a bug.
I wonder if there is any value in changing this variable to be three states,
like yes/no/maybe?
No - there are more bytes that have been bucketed but have not been consumed,
we definitely aren't at the end of data.
Yes - we consumed all data that has been bucketed and hit an EOF, definitely at
the end of data
Maybe - we consumed all the data that we bucketed, but we never got an EOF. We
might have hit the end, but we can't know for sure until we try parse again
My thinking is that I think it would be possible for the CLI to completely
consume all the data but because of how buckets are read it might not actually
hit the EOF. So the CLI might think it's not at the end of data since it hasn't
seen an EOF and output a "left over data" warning?
Maybe the CLI needs a special way to attempt one more byte and determine if
it's EOF or not? The CLI is file based so this is safe and shouldn't block.
Which makes me think maybe the InputSourceDataInputStream needs a change? Maybe
we add new method to determine for sure if we hit an EOF or not, which might
require attempt a read of one more byte to figure it out? Maybe isAtEnd
shouldn't even be part of the parse result and should be deprecated? And one
should only ask the ISDIS if there's more data or if an EOF was hit?
For streaming message, maybe logic becomes something like:
{code:scala}
while (isdis.isAtLeastOneByteAvailable()) // bad name, but blocks until at
least one byte is available or EOF
varl res = dp.parse(isidis, infosetOutputter)
...
}
{code}
This way the blocking happens in the main thread and is controlled by the user,
rather than blocking somehere deep inside of Daffodil?
And CLI logic becomes something like
{code:scala}
val res = dp.parse(isidis, infosetOutputter)
if (isidis.atLeastOneByteIsAvailable()) // because this is CLI and file-based,
immediately returns true/false if EOF was hit
error("left over data")
}
{code}
> Parse must behave properly for reading data from TCP sockets
> ------------------------------------------------------------
>
> Key: DAFFODIL-2502
> URL: https://issues.apache.org/jira/browse/DAFFODIL-2502
> Project: Daffodil
> Issue Type: Bug
> Components: API, Back End
> Affects Versions: 3.0.0
> Reporter: Mike Beckerle
> Assignee: Mike Beckerle
> Priority: Major
>
> Daffodil assumes the input streams are like files - reads are always blocking
> for either 1 or more bytes of data, or End-of-data.
> People want to use Daffodil to read data from TCP/IP sockets. These can
> return 0 bytes from a read because there is no data available, but that does
> NOT mean the end of data. It's just a temporary condition. More data may come
> along.
> Daffodil's InputSourceDataInputStream is wrapped around a regular Java input
> stream, and enables us to support incoming messages which do not conform to
> byte-boundaries.
> The problem is that there's no way for users to wrap an
> InputSourceDataInputStream around a TCP/IP socket, and have it behave
> properly when a read() call temporarily says 0 bytes available.
> Obviously we don't want to sit in a tight loop just retrying the read until
> we get either some bytes or end-of-data.
> The right API here is that if the read() of the underlying java stream
> returns 0 bytes, that a hook function supplied by the API user is called.
> One obvious thing a user can do is put a call to Thread.yield() in the hook.
> (That might even want to be the default behavior if they supply no hook.)
> Then if they have a separate thread parsing the data with daffodil, that
> thread will at least yield the CPU, i.e., behave politely in a multi-threaded
> world.
> More advanced usage could start a Daffodil parse using co-routines, returning
> control to the caller when the parse must pause due to read() of the Java
> input stream returning 0 bytes.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)