Mike Beckerle created DAFFODIL-2504:
---------------------------------------
Summary: Parse text of non-specified length from TCP hangs
unnecessarily
Key: DAFFODIL-2504
URL: https://issues.apache.org/jira/browse/DAFFODIL-2504
Project: Daffodil
Issue Type: Bug
Components: Back End
Affects Versions: 3.0.0
Reporter: Mike Beckerle
See tests
{color:#00627a}testDaffodilParseFromNetworkDelimited1{color}
{color:#00627a}testDaffodilParseFromNetworkDelimited1b{color}
{color:#00627a}testDaffodilParseFromNetworkDelimited2{color}
{color:#00627a}testDaffodilParseFromNetworkDelimited2b{color}
{color:#00627a}When parsing text from a network TCP stream, the parse should
succeed once the parser knows it has matched the longest possible delimiter. It
should not require more than that many characters to be present on the data
stream in order for the parse to complete. {color}
{color:#00627a}There are no tests as yet, but presumably lengthKind 'pattern'
will have a similar issue where only enough characters should be needed to
provide the knowably longest match for the regex. (For example, suppose
dfdl:lengthPattern="." which is looking for exactly 1 byte. The match of this
should NOT require that more than one byte be available on the TCP stream.
{color}
{color:#00627a}The arbitrary size 8 of the CharBuffer in
InputSourceDataInputStream leads to this requiring around 8 characters of look
ahead beyond the last character matched to the delimiter. Resizing this to 2
allows tests to succeed with fewer lookahead characters, but really the whole
approach/algorithm needs to be examined to really consider the lookahead, and
if it can be avoided in many cases.{color}
{color:#00627a}It is known that you can't always avoid looking ahead 1
character. {color}{color:#00627a}For matching delimiters that use DFDL
Character Class Entities that can match a variable number of characters (e.g.,
WSP+, WSP*, and NL) a lookahead of 1 is clearly necessary to know if the match
is complete. {color}
{color:#00627a}For matching regular expressions, since they can lookahead an
arbitrary finite distance, the amount of lookahead required depends on the
specific regex. {color}
{color:#00627a}Since some amount of look ahead is required in these cases,
fixing this issue for the simpler situation of just basic delimiters with a
fixed number of characters seems relatively low priority. {color}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)