[
https://issues.apache.org/jira/browse/DAFFODIL-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330433#comment-17330433
]
Mike Beckerle edited comment on DAFFODIL-2502 at 4/26/21, 1:57 AM:
-------------------------------------------------------------------
-Look at the javadoc for the java.io.DataInputStream API. This is derived from
InputStream, yet the description of the read(buf, off, len) method allows it to
return 0 bytes.-
-This violates the Liskov substitution principle. The class derivation tells
you a DataInputStream isA InputStream, but it isn't because it widens the
return results of a method.-
-Of course sometimes one class is derived from another just to enable code
sharing, and that's the case in this situation. Client code of these class APIs
must be written to specifically deal with one or the other behavior from reads.-
-This is the exact argument for a "delegates to" keyword in OO languages. So
that you can share code and method type signatures, but NOT inherit from a
type/trait.-
-I think the right behavior on read(buf, off, len) returning 0 for daffodil is
call a hook, the default hook method just does Thread.yield(). I wouldn't
bother putting time outs in there. Just a yield should allow producing threads
to run, or just not burn CPU while waiting for data if there are other
activities for the CPU(s) to do.-
-I noodled some code together using coroutines which returns to the caller an
Either[Unit, ParseResult] where if an underlying read(buf, off, len) call
returns 0 bytes, it first tries a Thread.yield(), and if there is still 0 bytes
available it coroutine resumes back to the caller with Left(Unit) to indicate
it can be called again to do more later, and if it completes the parse
resumeFinal back to caller with Right(parseResult). I have to build a little
test rig to exercise this.-
-Interestingly this can all be built in a generic library in daffodil-lib.
Nothing really about "daffodil" or parsing per-se in it. It's a generic
capability to cope with consuming from non-blocking streams.-
was (Author: mbeckerle):
Look at the javadoc for the java.io.DataInputStream API. This is derived from
InputStream, yet the description of the read(buf, off, len) method allows it to
return 0 bytes.
This violates the Liskov substitution principle. The class derivation tells you
a DataInputStream isA InputStream, but it isn't because it widens the return
results of a method.
Of course sometimes one class is derived from another just to enable code
sharing, and that's the case in this situation. Client code of these class APIs
must be written to specifically deal with one or the other behavior from reads.
This is the exact argument for a "delegates to" keyword in OO languages. So
that you can share code and method type signatures, but NOT inherit from a
type/trait.
I think the right behavior on read(buf, off, len) returning 0 for daffodil is
call a hook, the default hook method just does Thread.yield(). I wouldn't
bother putting time outs in there. Just a yield should allow producing threads
to run, or just not burn CPU while waiting for data if there are other
activities for the CPU(s) to do.
I noodled some code together using coroutines which returns to the caller an
Either[Unit, ParseResult] where if an underlying read(buf, off, len) call
returns 0 bytes, it first tries a Thread.yield(), and if there is still 0 bytes
available it coroutine resumes back to the caller with Left(Unit) to indicate
it can be called again to do more later, and if it completes the parse
resumeFinal back to caller with Right(parseResult). I have to build a little
test rig to exercise this.
Interestingly this can all be built in a generic library in daffodil-lib.
Nothing really about "daffodil" or parsing per-se in it. It's a generic
capability to cope with consuming from non-blocking streams.
> Parse must behave properly for reading data from TCP sockets
> ------------------------------------------------------------
>
> Key: DAFFODIL-2502
> URL: https://issues.apache.org/jira/browse/DAFFODIL-2502
> Project: Daffodil
> Issue Type: Bug
> Components: API, Back End
> Affects Versions: 3.0.0
> Reporter: Mike Beckerle
> Assignee: Mike Beckerle
> Priority: Major
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Daffodil assumes the input streams are like files - reads are always blocking
> for either 1 or more bytes of data, or End-of-data.
> People want to use Daffodil to read data from TCP/IP sockets. These can
> return 0 bytes from a read because there is no data available, but that does
> NOT mean the end of data. It's just a temporary condition. More data may come
> along.
> Daffodil's InputSourceDataInputStream is wrapped around a regular Java input
> stream, and enables us to support incoming messages which do not conform to
> byte-boundaries.
> -The problem is that there's no way for users to wrap an
> InputSourceDataInputStream around a TCP/IP socket, and have it behave
> properly when a read() call temporarily says 0 bytes available.-
> -Obviously we don't want to sit in a tight loop just retrying the read until
> we get either some bytes or end-of-data.-
> -The right API here is that if the read() of the underlying java stream
> returns 0 bytes, that a hook function supplied by the API user is called.-
> -One obvious thing a user can do is put a call to Thread.yield() in the hook.
> (That might even want to be the default behavior if they supply no hook.)
> Then if they have a separate thread parsing the data with daffodil, that
> thread will at least yield the CPU, i.e., behave politely in a multi-threaded
> world.-
> -More advanced usage could start a Daffodil parse using co-routines,
> returning control to the caller when the parse must pause due to read() of
> the Java input stream returning 0 bytes.-
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)