[
https://issues.apache.org/jira/browse/DAFFODIL-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mike Beckerle updated DAFFODIL-2504:
------------------------------------
Summary: Parse text of non-specified length from TCP hangs needlessly
(was: Parse text of non-specified length from TCP hangs unnecessarily)
> Parse text of non-specified length from TCP hangs needlessly
> ------------------------------------------------------------
>
> Key: DAFFODIL-2504
> URL: https://issues.apache.org/jira/browse/DAFFODIL-2504
> Project: Daffodil
> Issue Type: Bug
> Components: Back End
> Affects Versions: 3.0.0
> Reporter: Mike Beckerle
> Assignee: Mike Beckerle
> Priority: Minor
>
> See tests
> {color:#00627a}testDaffodilParseFromNetworkDelimited1{color}
> {color:#00627a}testDaffodilParseFromNetworkDelimited1b{color}
> {color:#00627a}testDaffodilParseFromNetworkDelimited2{color}
> {color:#00627a}testDaffodilParseFromNetworkDelimited2b{color}
> {color:#00627a}When parsing text from a network TCP stream, the parse should
> succeed once the parser knows it has matched the longest possible delimiter.
> It should not require more than that many characters to be present on the
> data stream in order for the parse to complete. {color}
> {color:#00627a}There are no tests as yet, but presumably lengthKind 'pattern'
> will have a similar issue where only enough characters should be needed to
> provide the knowably longest match for the regex. (For example, suppose
> dfdl:lengthPattern="." which is looking for exactly 1 byte. The match of this
> should NOT require that more than one byte be available on the TCP stream.
> {color}
> {color:#00627a}The arbitrary size 8 of the CharBuffer in
> InputSourceDataInputStream leads to this requiring around 8 characters of
> look ahead beyond the last character matched to the delimiter. Resizing this
> to 2 allows tests to succeed with fewer lookahead characters, but really the
> whole approach/algorithm needs to be examined to really consider the
> lookahead, and if it can be avoided in many cases.{color}
> {color:#00627a}It is known that you can't always avoid looking ahead 1
> character. {color}{color:#00627a}For matching delimiters that use DFDL
> Character Class Entities that can match a variable number of characters
> (e.g., WSP+, WSP*, and NL) a lookahead of 1 is clearly necessary to know if
> the match is complete. {color}
> {color:#00627a}For matching regular expressions, since they can lookahead an
> arbitrary finite distance, the amount of lookahead required depends on the
> specific regex. {color}
> {color:#00627a}Since some amount of look ahead is required in these cases,
> fixing this issue for the simpler situation of just basic delimiters with a
> fixed number of characters seems relatively low priority. {color}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)