[ 
https://issues.apache.org/jira/browse/DAFFODIL-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Beckerle updated DAFFODIL-2504:
------------------------------------
    Summary: Parse text of non-specified length from TCP hangs needlessly  
(was: Parse text of non-specified length from TCP hangs unnecessarily)

> Parse text of non-specified length from TCP hangs needlessly
> ------------------------------------------------------------
>
>                 Key: DAFFODIL-2504
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2504
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Back End
>    Affects Versions: 3.0.0
>            Reporter: Mike Beckerle
>            Assignee: Mike Beckerle
>            Priority: Minor
>
> See tests
> {color:#00627a}testDaffodilParseFromNetworkDelimited1{color}
> {color:#00627a}testDaffodilParseFromNetworkDelimited1b{color}
> {color:#00627a}testDaffodilParseFromNetworkDelimited2{color}
> {color:#00627a}testDaffodilParseFromNetworkDelimited2b{color}
> {color:#00627a}When parsing text from a network TCP stream, the parse should 
> succeed once the parser knows it has matched the longest possible delimiter. 
> It should not require more than that many characters to be present on the 
> data stream in order for the parse to complete. {color}
> {color:#00627a}There are no tests as yet, but presumably lengthKind 'pattern' 
> will have a similar issue where only enough characters should be needed to 
> provide the knowably longest match for the regex. (For example, suppose 
> dfdl:lengthPattern="." which is looking for exactly 1 byte. The match of this 
> should NOT require that more than one byte be available on the TCP stream. 
> {color}
> {color:#00627a}The arbitrary size 8 of the CharBuffer in 
> InputSourceDataInputStream leads to this requiring around 8 characters of 
> look ahead beyond the last character matched to the delimiter. Resizing this 
> to 2 allows tests to succeed with fewer lookahead characters, but really the 
> whole approach/algorithm needs to be examined to really consider the 
> lookahead, and if it can be avoided in many cases.{color}
> {color:#00627a}It is known that you can't always avoid looking ahead 1 
> character. {color}{color:#00627a}For matching delimiters that use DFDL 
> Character Class Entities that can match a variable number of characters 
> (e.g., WSP+, WSP*, and NL) a lookahead of 1 is clearly necessary to know if 
> the match is complete. {color}
> {color:#00627a}For matching regular expressions, since they can lookahead an 
> arbitrary finite distance, the amount of lookahead required depends on the 
> specific regex. {color}
> {color:#00627a}Since some amount of look ahead is required in these cases, 
> fixing this issue for the simpler situation of just basic delimiters with a 
> fixed number of characters seems relatively low priority. {color}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to