[
https://issues.apache.org/jira/browse/DAFFODIL-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331002#comment-17331002
]
Steve Lawrence commented on DAFFODIL-2502:
------------------------------------------
{quote}Ex: regex pattern match - I believe we fill a buffer of some adapted
size, then try to match, but the match may turn out smaller than the buffer, so
having requested the larger buffer we potentially blocked getting data to fill
that buffer, where the parse could have succeed with less.
{quote}
Yeah, I think you're right. I think we block until we fill up the regex buffer
or hit EOF for each match attempt. We do have a tunable to change the max regex
match limit so that could maybe help. That's similar to the format 2 issue
though. If we have do scan data until we find something, we might need to wait
until a following message to know that the scan didn't match.
{quote}If we have a complex type element with specified length.
{quote}
I think we used to have this problem where we would force reading an entire
complex length. A large specified lengths cause some memory issues. But I think
3.0.0 (or maye earlier?) fixed this. We now only require bytes are read in the
I/O layer for simple types. For complex types, we only set a limit so the
children can't consume passed that specified length, and only read bytes in the
I/O layer if we need to skip bits associated with the complex type.
{quote}I will investigate source code I can find online to see if this sort of
logic is there, and maybe rig up an experiment to test it.
{quote}
Yeah. I tried to find the code, but it looks like things start dropping into
native code for actual socket reads, and couldn't immediately find the native
code and gave up. An experiement to show the behavior would be nice for sure.
> Parse must behave properly for reading data from TCP sockets
> ------------------------------------------------------------
>
> Key: DAFFODIL-2502
> URL: https://issues.apache.org/jira/browse/DAFFODIL-2502
> Project: Daffodil
> Issue Type: Bug
> Components: API, Back End
> Affects Versions: 3.0.0
> Reporter: Mike Beckerle
> Assignee: Mike Beckerle
> Priority: Major
>
> Daffodil assumes the input streams are like files - reads are always blocking
> for either 1 or more bytes of data, or End-of-data.
> People want to use Daffodil to read data from TCP/IP sockets. These can
> return 0 bytes from a read because there is no data available, but that does
> NOT mean the end of data. It's just a temporary condition. More data may come
> along.
> Daffodil's InputSourceDataInputStream is wrapped around a regular Java input
> stream, and enables us to support incoming messages which do not conform to
> byte-boundaries.
> The problem is that there's no way for users to wrap an
> InputSourceDataInputStream around a TCP/IP socket, and have it behave
> properly when a read() call temporarily says 0 bytes available.
> Obviously we don't want to sit in a tight loop just retrying the read until
> we get either some bytes or end-of-data.
> The right API here is that if the read() of the underlying java stream
> returns 0 bytes, that a hook function supplied by the API user is called.
> One obvious thing a user can do is put a call to Thread.yield() in the hook.
> (That might even want to be the default behavior if they supply no hook.)
> Then if they have a separate thread parsing the data with daffodil, that
> thread will at least yield the CPU, i.e., behave politely in a multi-threaded
> world.
> More advanced usage could start a Daffodil parse using co-routines, returning
> control to the caller when the parse must pause due to read() of the Java
> input stream returning 0 bytes.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)